Monday, 28 January 2013

Simulating Metagame Evolution and PT Return to Ravnica

The tl;dr

Generally:
  • Beating the most popular deck is overrated
  • If your goal is winning the tournament (as opposed to top 8ing or doing well generically) then the optimal strategy may be very different.
For PT: RTR
  • Eggs was the best deck in round 1.
  • Poison was the best deck in round 10 of modern.
  • Scapeshift, Robots and UW were worse then you know

Table of Contents

  1. Introduction
  2. The Rock Paper Scissors Example (Mascoli 2012)
  3. PT: RTR
    3.1 Relevant Assumptions
    3.2 Simulation Results
    3.3 Theoretical Results (to be updated at a later date)
    3.4 Final Notes

1. Introduction

Ask ten pros and you will get ten answers on how and whether you should metagame. However, almost all of us (or if I choose to keep it real “them”) are answering mostly based on experience and intuition. Until recently I thought that was fine. The truth is far more interesting.

Chris Mascoli (check Gatheringmagic.com) recently published an article on metagaming and magic which was the entire impetus for the work I have done here. In the comments surrounding the article Mike Flores was credited with coming up with some ancient work which originally exposited on the idea. I hope to build on what they have started.

I am going to try and give more explanation, do more theoretical work (as opposed to pure simulations) and finally apply it to the most recent pro tour. The conclusions are hopefully interesting and unobvious enough to be worth devoting a mammoth post to. The program I used to simulate the results was created completely independent from Chris’ work and this conveniently provides me a way to test the validity of the program (assuming Chris’ work was also correct).

I summarize the key insights in Metagame Rules that are bolded below.

2. Advanced Rock Paper Scissors

Consider a rock paper scissors tournament where you pick one strategy and must play option every round. The tournament is run using with the standard swiss rules and a mirror match is 50/50. What is the optimal deck if the tournament featured 300 players and 8 rounds with the following metagame:
  • 40% Rock
  • 33% Scissors
  • 27% paper
Chart 1. Metagame Shares


Original Metagame

% of Top 8 (Chris)
% of Top 8 (Lucas)
Win % (Chris)
Win % (Lucas)
Rock
40.00%

13.57%
14.85%

13.33%
14.10%
Paper
26.67%

55.43%
53.90%

16.36%
17.70%
Scissors
33.33%

31.01%
31.25%

70.31%
68.20%


I have included both the result of my simulation (1000 trials) and Chris’ so that we can gain a little confidence that the algorithm is working (admittedly using some assumptions). Note the use of 8 rounds (which is incorrect) was a small oversight by Chris but since the example still provides the intuition we are looking for I decided to run with it.
The key results are that:
  • Paper is the best deck for top8ing
  • Scissor is the play if you want to win
Metagame Rule 1: A popular but poorly situated deck will see its metagame share over the course of the tournament. The top situated decks start to become over-represented relative to initial popularity. The metagame evolves.

Metagame Rule 2: This significantly impacts who top 8s (and as a corollary who wins). Beating the most popular deck is good for getting a top 8. But to win you want to beat the well situated deck. Winning and doing well (defined as top 8) should be treated as different goals.

This chart shows what the above percentages mean from the perspective a specific individual. For example, if I told you the result was that Paper and Rock both had 33% of the top 8 you might think there was no advantage to picking one deck. But, there were different starting positions for the two strategies. While we might expect there to be 2.66 players for both strategies in the top 8, there were more people who started with rock than paper, thus showing up with rock is worse than paper for each individual rock player.

Chart 2. Player Value
Representative Tournament
Original Metagame (# Players)
# Players in Top 8 (Lucas)
# of Winners
Rock
120.00
1.20
0.14
Paper
80.00
4.32
0.18
Scissors
100.00
2.48
0.68

In this case we can see that although roughly an equal number of tournaments are won by Paper and Rock, you personally are much more likely to win if you play paper (since fewer of them exist).

Metagame Rule 3: Whether a deck is good, is not dependent on how much of the top 8 it expects to be, but rather whether or not the deck is increasing its metagame percentage from the initial position and by how much it does so. You would rather be 5% of the initial meta and 10% of winners, than part of a deck that was both 50% of the Meta and 50% of winners. This is important for when we analyze the actual Pro Tour.

3. Pro Tour: Return to Ravnica

Assumptions for Simulation
  • There were 382 players who participated in every round
  • The tournaments was 10 continuous rounds of Modern and had no limited portion
  • The win percentage matrix (see below)
  • The metagame consisted of all decks with an initial metagame share greater than 1.5% and all other decks are lumped into other
The Win Percentage Matrix is a table of every deck which desrcibes the probability that the deck on the Y-Axis beats the deck on the X-Axis. For Rock-Paper-Scissors it looks like:

Chart 3. Win Percentages RPS
Win Probability
Rock
Paper 
Scissors
Rock
50%
0%
100%
Paper
100%
50%
0%
Scissors
0%
100%
50%

For PT RTR it looks like:
Chart 4. Win Percentages Modern



Some of this was filled out using Paul Jordan’s metagame article and some of it was just my subjective best guess. I tried two different versions. In version one I used my best guess for rough win percentages. In version 2 (the one from above), I tweaked version one until the probability of winning a random match was equal (or close) to their actual total win percentage at the Pro Tour (again from Paul’s article). The big difference comes in how the decks are treated when they play the nebulous “other” decks. We can see based on these assumptions the overall win percentage and how it compares to what actually happened. This is simply a sanity check.

Chart 5. Comparing Assumed (based on above) vs Actual Win percentages



Notes from the previous 2 charts
  • Affinity and Scapeshift have very good matchups against Jund, but not great matchups elsewhere
  • The two most popular decks have below average win percentages (Jund and Other)
  • Eggs has pretty much universally favorable matchups
  • Poison has a high overall win percentage but a poor Jund matchup
Simulation Results This shows us how we should expect the tournament to shake out in terms of composition.

Chart 6.  Metagame and Top 8 Shares

The following gives us an idea of what the best decks are. The added value measure calculates your advantage compared to assuming that each person is exactly equally likely to top 8 (or win) the tournament. 100% means you are twice as likely as someone with no advantage or disadvantage from deck choice to top 8. −50% means you are half as likely. The probabilities are the probability that a given individual would accomplish top 8 (or win). In other words if I chose to play Jund at PT RTR I was giving myself .1% chance of winning which is distinct from Jund having an 11.8% chance of winning overall.

Chart 7 Finding the Best Deck.


My Takeaways
  1. Poison wasn’t hurt much by its bad Jund matchup. The rest of the field slowly whittled Jund down and Poison was able to prey on the decks that were doing that (Scapeshift, Tron, Eggs etc.).
  2. Affinity was hurt because its main source of +EV is dissapearing in the later rounds. It also starts to form a large part of the meta as the tournament evolved and thus faced increasingly frequent mirror matches (making it harder for individual pilots to succeed).
  3. Only 15% of decks significantly improved upon the benchmark for the purposes of top 8ing.
  4. Edit: 01/29 Eggs was the second best deck choice for the tournament. But it was only the 4th most likely deck to win the tournament, of course this ignores the possibility that Cifka's list was better than generic eggs.
  5. The fact there was so much Jund in the t8 was probably due to above average limited performances (Ochoa 15, Edel 12, Yuya 15)
  6. Storm’s Performance (which was fairly solid) suggest a lack of “average” storm players. More likely were some people with very good decks and some people with very bad versions.
Metagame Rule 4: In this environment most decks are bad choices. Arguably 85% of decks were below average choices. If the most popular deck is not the “best deck” the metagame decision is very important.

Metagame Rule 5: Being better with your deck (as opposed to trying to metagame) is better when the tournaments are less rounds. Imagine being a rock player with a 20% win percentage against paper and a 70% win percentage in the mirror. You still wouldn’t want to be in the average top 8 from the first example. In general we probably overestimate the value of being good with your pet deck. This will fall out of the theoretical work I plan to examine later.

Final Thoughts

I am going to add some further analysis based on theory to answer some hypotheticals that some may find interesting such as:
  • How much is being Yuya worth?
  • Does number of rounds matter a little or a lot?
  • Does the theory (which requires even more simplification) agree with the Simulations (spoiler: Science works)?
I would also like to run some numbers on another metagame where the “best deck” was actually the most popular deck. In what I expect will be the mother of all plot twists, I assume the math will say to play the best deck. Not sure where to find that stuff so I will do some digging. Maybe PT: Tempered Steel or Cawblade.

4 comments:

  1. Sweet post! It's gonna take a second, or maybe third read to fully digest.

    ReplyDelete
  2. One nitpick. You said that Eggs winning was the second most likely *outcome*, when it was actually the second best *choice* in terms of maximizing your own chance of winning. Including "Other", there were four decks that were more likely than Eggs to win the entire tournament, Robots, Other, Jund, and UW, in that order.

    ReplyDelete
  3. Would you be open to sharing your simulation code? I have some interest in modeling a metagame with a dredge deck. Something like a 3 game match, if your opponent is one of the n% who is prepared your g1 win % is 90% and g2 is 30%, or similar.

    ReplyDelete
  4. Excellent article! I just came back to play MTG working with my teammates for GP Buenos Aires this coming week. I've been reading a lot of theory articles in just a few weeks trying to understand how to pick the right deck for the tournament and I have to say that this article has definitely improve my understanding of the game. I was wondering if you could share the source code of your work? Appreciate it.

    ReplyDelete