Monday, 28 January 2013

Simulating Metagame Evolution and PT Return to Ravnica

The tl;dr

Generally:
  • Beating the most popular deck is overrated
  • If your goal is winning the tournament (as opposed to top 8ing or doing well generically) then the optimal strategy may be very different.
For PT: RTR
  • Eggs was the best deck in round 1.
  • Poison was the best deck in round 10 of modern.
  • Scapeshift, Robots and UW were worse then you know

Table of Contents

  1. Introduction
  2. The Rock Paper Scissors Example (Mascoli 2012)
  3. PT: RTR
    3.1 Relevant Assumptions
    3.2 Simulation Results
    3.3 Theoretical Results (to be updated at a later date)
    3.4 Final Notes

1. Introduction

Ask ten pros and you will get ten answers on how and whether you should metagame. However, almost all of us (or if I choose to keep it real “them”) are answering mostly based on experience and intuition. Until recently I thought that was fine. The truth is far more interesting.

Chris Mascoli (check Gatheringmagic.com) recently published an article on metagaming and magic which was the entire impetus for the work I have done here. In the comments surrounding the article Mike Flores was credited with coming up with some ancient work which originally exposited on the idea. I hope to build on what they have started.

I am going to try and give more explanation, do more theoretical work (as opposed to pure simulations) and finally apply it to the most recent pro tour. The conclusions are hopefully interesting and unobvious enough to be worth devoting a mammoth post to. The program I used to simulate the results was created completely independent from Chris’ work and this conveniently provides me a way to test the validity of the program (assuming Chris’ work was also correct).

I summarize the key insights in Metagame Rules that are bolded below.

2. Advanced Rock Paper Scissors

Consider a rock paper scissors tournament where you pick one strategy and must play option every round. The tournament is run using with the standard swiss rules and a mirror match is 50/50. What is the optimal deck if the tournament featured 300 players and 8 rounds with the following metagame:
  • 40% Rock
  • 33% Scissors
  • 27% paper
Chart 1. Metagame Shares


Original Metagame

% of Top 8 (Chris)
% of Top 8 (Lucas)
Win % (Chris)
Win % (Lucas)
Rock
40.00%

13.57%
14.85%

13.33%
14.10%
Paper
26.67%

55.43%
53.90%

16.36%
17.70%
Scissors
33.33%

31.01%
31.25%

70.31%
68.20%


I have included both the result of my simulation (1000 trials) and Chris’ so that we can gain a little confidence that the algorithm is working (admittedly using some assumptions). Note the use of 8 rounds (which is incorrect) was a small oversight by Chris but since the example still provides the intuition we are looking for I decided to run with it.
The key results are that:
  • Paper is the best deck for top8ing
  • Scissor is the play if you want to win
Metagame Rule 1: A popular but poorly situated deck will see its metagame share over the course of the tournament. The top situated decks start to become over-represented relative to initial popularity. The metagame evolves.

Metagame Rule 2: This significantly impacts who top 8s (and as a corollary who wins). Beating the most popular deck is good for getting a top 8. But to win you want to beat the well situated deck. Winning and doing well (defined as top 8) should be treated as different goals.

This chart shows what the above percentages mean from the perspective a specific individual. For example, if I told you the result was that Paper and Rock both had 33% of the top 8 you might think there was no advantage to picking one deck. But, there were different starting positions for the two strategies. While we might expect there to be 2.66 players for both strategies in the top 8, there were more people who started with rock than paper, thus showing up with rock is worse than paper for each individual rock player.

Chart 2. Player Value
Representative Tournament
Original Metagame (# Players)
# Players in Top 8 (Lucas)
# of Winners
Rock
120.00
1.20
0.14
Paper
80.00
4.32
0.18
Scissors
100.00
2.48
0.68

In this case we can see that although roughly an equal number of tournaments are won by Paper and Rock, you personally are much more likely to win if you play paper (since fewer of them exist).

Metagame Rule 3: Whether a deck is good, is not dependent on how much of the top 8 it expects to be, but rather whether or not the deck is increasing its metagame percentage from the initial position and by how much it does so. You would rather be 5% of the initial meta and 10% of winners, than part of a deck that was both 50% of the Meta and 50% of winners. This is important for when we analyze the actual Pro Tour.

3. Pro Tour: Return to Ravnica

Assumptions for Simulation
  • There were 382 players who participated in every round
  • The tournaments was 10 continuous rounds of Modern and had no limited portion
  • The win percentage matrix (see below)
  • The metagame consisted of all decks with an initial metagame share greater than 1.5% and all other decks are lumped into other
The Win Percentage Matrix is a table of every deck which desrcibes the probability that the deck on the Y-Axis beats the deck on the X-Axis. For Rock-Paper-Scissors it looks like:

Chart 3. Win Percentages RPS
Win Probability
Rock
Paper 
Scissors
Rock
50%
0%
100%
Paper
100%
50%
0%
Scissors
0%
100%
50%

For PT RTR it looks like:
Chart 4. Win Percentages Modern



Some of this was filled out using Paul Jordan’s metagame article and some of it was just my subjective best guess. I tried two different versions. In version one I used my best guess for rough win percentages. In version 2 (the one from above), I tweaked version one until the probability of winning a random match was equal (or close) to their actual total win percentage at the Pro Tour (again from Paul’s article). The big difference comes in how the decks are treated when they play the nebulous “other” decks. We can see based on these assumptions the overall win percentage and how it compares to what actually happened. This is simply a sanity check.

Chart 5. Comparing Assumed (based on above) vs Actual Win percentages



Notes from the previous 2 charts
  • Affinity and Scapeshift have very good matchups against Jund, but not great matchups elsewhere
  • The two most popular decks have below average win percentages (Jund and Other)
  • Eggs has pretty much universally favorable matchups
  • Poison has a high overall win percentage but a poor Jund matchup
Simulation Results This shows us how we should expect the tournament to shake out in terms of composition.

Chart 6.  Metagame and Top 8 Shares

The following gives us an idea of what the best decks are. The added value measure calculates your advantage compared to assuming that each person is exactly equally likely to top 8 (or win) the tournament. 100% means you are twice as likely as someone with no advantage or disadvantage from deck choice to top 8. −50% means you are half as likely. The probabilities are the probability that a given individual would accomplish top 8 (or win). In other words if I chose to play Jund at PT RTR I was giving myself .1% chance of winning which is distinct from Jund having an 11.8% chance of winning overall.

Chart 7 Finding the Best Deck.


My Takeaways
  1. Poison wasn’t hurt much by its bad Jund matchup. The rest of the field slowly whittled Jund down and Poison was able to prey on the decks that were doing that (Scapeshift, Tron, Eggs etc.).
  2. Affinity was hurt because its main source of +EV is dissapearing in the later rounds. It also starts to form a large part of the meta as the tournament evolved and thus faced increasingly frequent mirror matches (making it harder for individual pilots to succeed).
  3. Only 15% of decks significantly improved upon the benchmark for the purposes of top 8ing.
  4. Edit: 01/29 Eggs was the second best deck choice for the tournament. But it was only the 4th most likely deck to win the tournament, of course this ignores the possibility that Cifka's list was better than generic eggs.
  5. The fact there was so much Jund in the t8 was probably due to above average limited performances (Ochoa 15, Edel 12, Yuya 15)
  6. Storm’s Performance (which was fairly solid) suggest a lack of “average” storm players. More likely were some people with very good decks and some people with very bad versions.
Metagame Rule 4: In this environment most decks are bad choices. Arguably 85% of decks were below average choices. If the most popular deck is not the “best deck” the metagame decision is very important.

Metagame Rule 5: Being better with your deck (as opposed to trying to metagame) is better when the tournaments are less rounds. Imagine being a rock player with a 20% win percentage against paper and a 70% win percentage in the mirror. You still wouldn’t want to be in the average top 8 from the first example. In general we probably overestimate the value of being good with your pet deck. This will fall out of the theoretical work I plan to examine later.

Final Thoughts

I am going to add some further analysis based on theory to answer some hypotheticals that some may find interesting such as:
  • How much is being Yuya worth?
  • Does number of rounds matter a little or a lot?
  • Does the theory (which requires even more simplification) agree with the Simulations (spoiler: Science works)?
I would also like to run some numbers on another metagame where the “best deck” was actually the most popular deck. In what I expect will be the mother of all plot twists, I assume the math will say to play the best deck. Not sure where to find that stuff so I will do some digging. Maybe PT: Tempered Steel or Cawblade.

Monday, 14 January 2013

Modern UW Control

List as of PTQ:
2 Scalding Tarn
2 Arid Mesa
4 Celestial Collonade
3 Seachrome Coast
1 Calciform Pools
1 Plains
4 Tectonic Edge
4 Hallowed Fountain
5 Island

3 Vendillion Clique
1 Snapcaster Mage
1 Restoration Angel
1 Consecrated Sphinx

1 Ratchet Bomb
2 Vedalken Shackles
1 Batterskull
1 Relic of Progenitus
2 Thirst for Knowledge
3 Supreme Verdict
2 Sphinx's REvelation
3 Path to Exile
3 Mana Leak
1 Remand
4 Cryptic Command
4 Spell Snare

SB:
4 Meddling Mage
2 Celestial Purge
1 Ratchet Bomb
1 Disenchant
1 Stony Silence
1 Negate
1 Spell Pierce
1 Path to Exile
1 Wurmcoil Engine
1 Rest in Peace
1 Linvala Keeper of Silence

MVPs:
Ratchet Bomb
Calciform pool
Vedalken Shackles

LVPs
C-Sphinx
Stony Silence
Path to Exile

Matchups
Twin 3-1 (loss was to RUG with Blood moons)
Poison 1-0
MonoWhite 1-0
RUG Delver 1-0
Storm 2-0
American Midrange 0-1
Jund 1-0

I Apologize for all spelling and grammar mistakes which are sure to follow. They are my mistakes and exclusively the falt of the Canadian Public School system.

I haven't tested the deck much because, I brewed it up mostly 1 hour before the PTQ. But I think all those matchups are things I am very happy to play.

The Genesis:
Early in the week I tried Jarvis and Fabiano's list for a couple of matches and wasn't happy with UW in general. I was losing in 2-mans to competent Jund (Cheon and Orsini Jones) and with the more creature heavy lists I felt only even against combo (as opposed to way ahead).

All the Wraths
Saturday evening I quickly 0-2ed with Zoo (went X-2 at last PTQ) and dropped. Waking up sunday I reviewed the decks at the top tables and realized everyone was playing creatures. Thus I decided I was going to play a bunch of insane cards against creatures. Shackles, Supreme Verdicts and Revelations. This combination of cards is almost impossible for any fair deck to play around (Jund does it the best). Unfortunately you can't really afford to play creatures when your on this strategy so you need ways to no-questions-asked close out the game (thus my choice of BOOMBOOM finishers).

Combo Me? Combo you!
I didn't want to play too many creatures (combo with wrath) and I needed to shore up combo matchups since I had many bricks so I played a crap ton of counters and a maindeck relic. My few early threats were also highly disruptive which helps since you often draw so many prices. V-Clique is also excellent with my boom booms since it easily strips their answer as you curve into your 6 (or 5). Thus I played Sphinx over Sun Titan since I felt I would be able to make it stick more often than previously. The one remand (#whatwouldOriedo?) is NOT an attempt to be cute. I actually think its worse then mana leak (especially with thirst in your deck), but I wanted to play two cantrips because I suspected 26 lands might be low.

Nice Spell Snares Bro
To alleviate the problem of drawing the wrong part of your deck in the wrong matchups i had the Thirst for Knowledge's. Where other people played Jace as a source of persistent card advatange I wanted to play almost exclusively on their turn and use my three drop card drawer to filter for the insane one ofs that were pertinent in each matchup. They are also excellent with situational sideboard cards. For example spell pierce is generally mediocre in the twin matchup. Early on you can just lose to blood moon (so it excels there), but the majority of games go long. So the card is pretty bad usually. Similarly you might want to bring in disenchant against RUG delver (in case of shackles or moon) but you don't want it rotting in your hand forever.

The only way Pikula Makes a top 8*.
Meddling mage is the allstar of this board. In every combo matchup you end up post board with essentially 10 disruptive creatures, 12 counterpells, 4-card drawers and some miscellaneous permanent hate. I like 1 Rest in Peace since its insane against storm (impossible for them to grapeshot you and this deck almost never loses to EtW) and I always bring in one against Jund. Stony silence is obviously #synergy with my shackles, wrath, revelations anti-affinity plan. I think the worst card in the board is probably the 4th path but its almost certainly a necessary evil. Without snapcasters you don't actually have very much removal.

Mana from the gods
The whole point of the deck as to make shackles work. Thus the 3 seachromes. I am also petrified of blood moon so I made the unusual choice to run 2 Arid Mesas. They usually cost you extra life but I like the insurance. The calciform pools was busy firing off revelations all day and I hated having 6 colorless sources essentially but it was probably the best one. The question isn't 25 or 26 lands. Its 26 or 27?

*this is a joke and Pikula was actually within a match in the Sat PTQ I believe (assuming his MODO nick is the obvious). But rather than be whined at for lack of respect (#whathaveyoudoneetc), I figured I would put an asterisk and save myself some flame-mail. Because words mean things.