Post-Tournament
How was the final bracket created?
The final bracket was a seeded, single-elimination tournament among the top 64 cards by Elo from the Top 10% Tournament. As an example, the #1 seed (Urza's Saga) was paired first against the 64th seed (Settle the Wreckage).
The precise bracket order (and the pairing list within each matchup) is shuffled, although traditional seeded bracket properties are retained (e.g, the 1st and 2nd seeds can't face until the finals). This was done to disguise the seeded nature of the bracket — if voters could suss out the exact seeding, it would likely spoil early matchups, where upsets were less likely.
Why are the Elos different between the Top 10% queue and the overall queue in the Card Browser?
The "All Cards" Elos reflect the Elos at the conclusion of the initial queues. These cards then entered the second, top 10% tournament, and the Elo on that page reflects their new Elo after that tournament.
How were the top 10% of cards selected? What happened to their Elo from the queues?
I selected the top 10% of cards per-queue, rather than the top 10% overall. Despite my best efforts, some queues received more votes-per-card than others. A consequence of an Elo system is that over time, the distribution accumulates wider tails, which means that the top 10% of queues with a higher vote count had higher average Elo. Hence, it seemed more fair to select per-queue.
When entering the top 10%, I set each card's starting Elo to its Elo from the regular queues, but I normalized the Elos so that each queue had the same average top 10% Elo.
What's the deal with Growth Spiral and Grief?
Minor issue for the keen-eyed observer. In the top 10% browser, you'll see that Growth Spiral is listed as the #64 card, but it wasn't in the top 64 bracket. In the hour or so between when I finalized the bracket logic and when I launched the live site, more votes continued to arrive (albeit at a slow pace), and a few votes against Grief bumped it out of the top 64. Small hiccups like this are inevitable for a tournament with 100% uptime like this one. Apologies to all fans of UG Ramp.
Elo Tournament
What is the inspiration for this project?
In 2016, Reddit user u/SaviaWanderer noticed that there were nearly exactly 214 unique Magic cards, and decided to run a Single Elimination bracket for all cards printed up to that point — eventually won by Lightning Bolt. 9 years later, we've very nearly doubled that number of cards, and I'd like to see if the second half of Magic cards has produced cards as iconic as the first half.
What cards are being ranked? Why not rank all the cards?
This time around, I'm only including cards that weren't included in the original bracket. In practice, this means every non-reprint card from Aether Revolt onwards. This is for two reasons:
Why not use a single elimination bracket?
As mentioned above, a bracket is very slow. Additionally, the original rating system was very rigid — in order to participate, you had one chance to vote per day, and you had to vote for a static number of cards. If you lost interest before finishing the day's poll, or finished the poll and wanted to rate more cards, you were out of luck.
The hope is that a more flexible system will allow Magic players to vote as often or as rarely as they like.
How does the new system work?
The new system is based on the Elo rating system used in Chess. After each vote, the winner will gain rating (and the loser will lose rating), with the degree of movement being based on the original Elo gap — winning against a card that people vote for a lot is worth more than winning against draft chaff nobody cares about. Over time, this will sort cards based on popularity.
What cards are available to vote on?
At any time, there will be an “active queue” with at least 750 cards in it, and it is from this pool of cards that pairings are drawn. Cycling through these queues ensures that repeat visitors will have new cards to vote on and rank, and allow even late arrivers to participate in the initial ranking of cards. (It also makes it more likely you’ll see the same cards as your friends).
How are pairings created?
The first card in each pair is drawn completely at random, with a slight bias for drawing higher Elo cards. The second card is drawn based on a two-stage process.
In the first stage, we look at Elo. Each matchup has a chance of applying one of several Elo restrictions, where we select a second card that is close-ish in rating.
In the second stage, we look at card similarity. Rather than pairing any two cards together, I'm specifically drawing a card that is mechanically similar. This should hopefully reduce the number of "Black Lotus vs vanilla Grizzly Bear" matchups that are boring and one sided. That said, you can use a slider to determine how much you care about this.
(There's a fair amount of asterisks & minor statistical corrections under the hood as well to minimize bias - these are explained at the end of the FAQ)
How is card similarity calculated?
I measure two axes of similarity — the first tracks card characteristics (color, mana value, type, etc), and the second tracks card text (with special weighting for keyword abilities and keyword actions). The relative weighting of these two axes is based on a weighting parameter, and each pairing randomly chooses one of three different models (each with different weights) to draw cards from.
There's a couple of smaller adjustments too — cards from the same set are slightly penalized to discourage them from appearing together, and Planeswalkers are slightly more likely to be paired together than their typeline / text would otherwise imply.
What cards are included?
Basically everything. I include standard sets, supplemental sets, Universes Beyond sets, Secret Lair exclusives, Un-Sets, and Holiday cards.
I do exclude some supplemental card types - e.g, Planes, Conspiracies, Contraptions, Schemes. I also exclude the Playtest cards - there's a huge number of these, and early feedback / votes suggested players didn't like seeing these.
What's the next step?
Once we have enough votes to rank all the individual queues, we’ll cut to the top ~10% of cards for a final elo competition among our top cards. This queue will remain open for a longer period of time, and the top performers will enter a final bracket with 64 or 128 cards to find the true “best” card of the second half of Magic.
Pairing Creation – More Statistical Detail
Under the hood, there's a lot of small adjustments to try to correct for bias while making the bracket more interesting. Let's talk about them.
Card A Selection
When a user first loads the page, we shuffle the full list of cards in the queue and create a "Card A" list that contains all 750 cards (or however large the queue size is) exactly once. Each time you vote, we draw the next card in that list, and choose another card to pair against it based on Elo, card similarity, and our various adjustments (described later).
The cards on this list aren't shuffled uniformly though - cards are placed within the list via a weighted draw, with a significant weighting for Elo and a minor weighting for in-degree bias.
First, cards with higher Elos make for more interesting pairings, and it's more important to find out the exact ranking of the top 100 cards than the bottom 100 cards. When we "draw" cards to place onto the Card A list, their chances to appear are weighted by the probabilities below. Note that every card must appear exactly once, so if you run the full queue to exhaustion (i.e, vote 750 times in a row), you'll start to see a lot of bad cards toward the end. On the other hand, these weightings aren't so extreme that low performing cards won't show up toward the beginning of your voting - they're just less likely to.
Also note that the Card A list is finalized when you first open your session. If you start voting as soon as the queue rotates, the "better" cards won't have had time to sort toward the top.
| Elo percentile within pool | Multiplier |
|---|---|
| ≤ 0.10 | ×0.45 |
| 0.20 | ×0.65 |
| 0.30 | ×0.75 |
| 0.50 | ×1.00 |
| 0.70 | ×1.10 |
| 0.90 | ×1.25 |
| ≥ 0.99 | ×1.35 |
The second corrective factor gives a 10% higher chance for cards in the bottom 10% of in-degree similarity. This will be discussed later.
Card B Elo Restriction
Each time we select Card A, we apply one of several Elo restrictions – these limit how far away Card B can be in Elo from the initial card. The reason we do this is to reduce the number of matchups between our most popular cards and chaff. Not only are these matchups less interesting, they don't change the Elos very much. The one exception is that skewed matchups make "spite voting" much easier. If you hate Quantum Riddler, and it's paired against a 1350 elo card, voting for the "worse" card can drag down the higher Elo card's rank much further than a single positive vote would help it.
The exact set of restrictions is below:
| Percent Chance of Restriction | Window Size | Fraction of Pool |
|---|---|---|
| 25% | +/- 5% Elo | 10% of pool |
| 25% | +/- 20% Elo | 40% of pool |
| 25% | +/- 40% Elo | 80% of pool |
| 25% | No filter | Entire pool |
Note that these restrictions are applied before card similarity among nearby cards in the pool is calculated.
In-Degree Similarity
The default of the bracket (and the one nearly everyone uses) retrieves cards within 25% similarity of Card A. However, not every card is equally likely to be within 25% of other cards.
As an example, Blink Dog is very unusual. Structurally, it's a 3 mana 1/1 at uncommon. From it's text, it references Double Strike and Phasing, which are both rare abilities. Consequently, it shows up in the closest 25% of cards only ~5% of the time. In contrast, Moonlit Scavengers has very common creature types and stats, and its text box involves bouncing creatures while also referencing artifacts and enchantments. It's "sort of close" to a TON of cards – two thirds of its queue include it in the "closest 25%".
All else equal, this would result in a bracket which paired Moonlit Scavenger as Card B far more often and Blink Dog far less often. To correct for this, when drawing card B, we use a weighted draw which skews slightly toward cards with lower "in-degree" (i.e, cards that appear as related cards less often than we'd expect by chance). For the Broad setting, these weights cap at a 2.5x multiplier for cards with very low in-degree, and a 0.3x multiplier for cards with a very high in-degree. As mentioned above, the bottom 10% of cards by in-degree also have a marginally higher chance of appearing earlier as Card A.
Note that on the "Wild" setting, card B pairings are completely random (subject to the Elo restriction above), so none of this applies.
Ballot Order Effect
The Ballot Order effect refers to the phenomenon wherein voters are more likely to choose whichever option is listed first. In order to combat this, the order that "Card A" and "Card B" are displayed to the user is shuffled for every vote.
Empirically, in this tournament, the card that appears on the left wins about 53.6% of the time. I haven't investigated yet, but I suspect this effect is stronger for lower rated cards that voters don't have a strong opinion on.
Intra-Queue Elo
Because each queue has a slightly different number of votes, and a random standard of competition, Elos from one queue to the next represent distinct tournament environments and aren't directly comparable. While this effect is minor, it would have slightly impact on which cards made the "top 10%" cut for the final queue. To correct for this, the top 10% cut will be done on a per-queue basis rather than an overall basis, so that every queue contributes the same number of cards to the final tournament.