Tuesday, 16 July 2013

One Game


It can be hard to figure out exactly how good you are.

You can play a whole game and make zero interesting decisions.

Or you could spend eight turns finding out you have a long way to go.

Three friends went to GP Providence. We had practiced a lot. We had a history of some success (2nd at the last Team GP) etc. But this isn’t a feel good tournament report. And it isn’t an appeal for pity.

I have finally found some time (and my notes) to do some honest reflection on facing two (maybe 3) future hall of famers; and then being weighed, being measured and  being  found wanting. Not a tournament report. Not a match report.

So don't call this a report so much as a story about one game against the best in the world.

Before the 2nd draft on Day 2, we were 9-3. That’s not the end of the world, but it is not a great place to face Cheon, Froelich and LSV.

I was summarily dispatched by LSV in the middle seat. Jamie beat Paul. Which means it would be Maksym vs Efro for all the marbles. In game 2, we played well and managed to find all the right attacks. It was one of those games, where you didn’t necessarily outplay your opponent. But rather we had managed not to snatch defeat from the jaws of victory.

I think most grinders would know the feeling.

So its time for game 3. The good news is that Maksym’s deck had Aetherling, Pack Rat and Soul Ransom. The bad news is that last year their team had more pro points then our lifetime totals combined. We were fighting the civil war of Ratinum. Efro’s deck was an aggressive boros deck splashing blue for Ral Zarek and Beck//Call. We knew about at least one Weapon Surge and were on the draw for game 3.





Jamie and I do a mental high five when they take their mulligan and we see a rat. At least I think we do. Jamie’s probably too a nice guy to revel in our opponent’s misfortune, but I hate imagining myself on the solo end of a high five.

Grade = A++. Opponent on 6. We have turn 2 rat with 3 lands in hand. Played this part perfectly.



Grade = A. Nothing to screw up. Yet.



There are some small set of scenarios where not playing pack rat is correct (and Maksym broached the topic). However, against an aggressive deck I don’t think you can possibly afford to be that cautious.

Grade: = A. Didn’t punt by not playing rat. No victories are too small for this story.

TURN 3



At 14 life we face our first real decision.

3) Should we spend a turn making rats or play a barrier?

3a) If we make a rat can we afford not to block?
If we don’t block next turn we will be at 10 (if he plays another creature), or 6 if he just double pumps.  After that we will have three 3/3s, but his Truefire Paladin is an abyss and his other guys are trading for rats.

3b) So we have to block if we make a rat. What are optimal blocks?
Presumably we would just block firstblade. A trick gets really costly here since we would be at ~10 with one rat facing 2 creatures. And again Truefire is close to abyss mode (assuming a 4th land).

3c) Whats the goal here?
We have soul ransom (he mulliganed) and tons of gas. So we just want this game to go as long as possible. Which means preserving life even if that means throwing away cards.

Hover Barrier makes the most sense in this context. Its going to be especially good if his 4th land doesn’t allow for double pumps (e.g. isn’t a mountain). A reasonable guess given his mulligan and being on the play.

Grade: A. Found the important strategy for the game.

TURN 4




Cluestone gives him double pump mana. Truefire gives a way to grow an army that could potentially fight rats. The whole game is going to shit. But he only has one card in hand and we have Soul Ransom. We could also Fatal Fumes here. Millenial gargoyle, call of the Nightwing and making PR#2 all don’t do enough defensively.

4) Should we Fumes or Soul Ransom?

4a) Assume Fumes whats the optimal target?
We can’t afford to let him have guildmage in the long game and Soul Ransom isn’t a permanent answer. So we would have to fumes guildmage. He then attacks with both.

4a – II) If he attacks with both what do we block?
Chumping with rat seems unadvisable (but maybe we should of considered it), so where to put the Barrier is question. Paladin would kill it setting us up with a Pack Rat vs his board of two creatures and being at 10. We would ransom paladin, he would discard two and we would still be at 10 and have to chump with Pack Rat or go to 2. Not a winnable board state.

If we block the Viashino, he pumps twice we go to 6 and Soul Ransom his Truefire. He discards and we put Hover Barrier in front. Leaving us with a rat at 4 life versus his two creatures. Not a winnable board state.

4b) What about Soul Ransom? Optimal Target?
I think its safe to assume he is going to crack the Soul Ransom to get back whatever we take. If he gets it back immediately, taking the Truefire is better since he can’t attack right away and the paladin isn’t useful summoning sick. If he is holding a good card (or draws one) he might wait a turn or two to crack it. In which case taking the guildmage is better. I didn’t want to give him option value (e.g. the ability to draw cards just make dudes), so I suggested we take the Sunhome Guildmage.

Grade: B. Not playing the fatal fumes is good and not an obvious line. In retrospect taking the guildmage might have been bad, since we can always fatal it the next turn if he decides to wait.

TURN 5



After Efro plays Goblin Rally, its obvious hes setting up to get his guildmage next turn.

5) Should we make a play, mainphase fatal fumes or hold up fatal fumes?

5a) Can we afford for him to get guildmage back?
No.

5b) So mainphase or wait for him to discard?
The first question is the interaction between Soul Ransom and Removal. Short answer is we get to draw 2. But, we had to ask a judge to confirm. Luckily LSV seemed to get the wrong read here (based on us asking the judge question). Maybe he assumed we knew basic rules interactions. Joke’s on him.

5b2) What happens if we wait, they figure it out and do nothing?
Well we have pack rat so our mana won’t really be wasted. And they won’t be able to attack. Seems like waiting is fine.

Grade A-: I think we made the right play, but it should have been obvious that we had removal because we had to talk to a judge. A massive leak which better players would avoid.

TURN 6


On his turn 6, efro discards two cards and we respond with fatal fumes. I have listed the 3 cards drawn on our turn 6 (two from Soul Ransom). He still gets to attack his board into our Rat + Barrier. We could also chump with a rat.

6a) Who are we blocking with Hover Barrier?
If we don't block Truefire, we go to two life. We would also be facing 6 creatures, with 4 potential blockers. So we need to block Truefire Paladin.

6b) Should we chump with rat?
We need to start making creatures at this point and Rat can make 2 a turn. Can’t afford to chump block (on Efro’s T6). 

For our turn 6 making two rats is the only way to make two blockers and not die. He has 6 attackers and we have 3 blockers during his turn 7.

Grade: A. Made all the right plays, though it is not like there were real decisions.

TURN 7


After he attacks with everything (4 tokens, firstblade, truefire). We make 2 rats going up to 3 total and chump + kill 2 tokens. On our turn we face a bunch of possibilities given that we have 2 rats in play.

7) What are the options?
Plan to make two more rats on his turn (while holding up Cancel). Suppose we make the third rat and block everything but one token. He can either pump (+ first strike) his Paladin or not. If he does we make the 4th rat going to one but ending up with 3 rats. If he keeps his mana up we can trade boards and have cancel for his threat, followed by a threat. We can beat a burn spell (assuming it costs more than 2 mana) with this line.

Alternatively we can play a land and cast CotN.

7a) Why cast Call of the Nightwing (CotN)? Why Not?
He can’t block the ciphered rat (because we can make a third rat in combat). We end up with 2 bats, 2 rats (1 untapped) and the ability to make 1 more rat, but no Cancel. Our ciphered rat is unlikely to get in again. This is fine if he draws nothing.

However we lose to burn and maybe top decked tricks. We are also lower on cards in hand (because we need to make another land drop_ so in a stalemate we could conceivably lose given his abyss Paladin.

Jamie and I thought we could afford to play around top decks (and hold up cancel).

Maksym wanted to CotN and try and end the game. Maksym was losing a game with turn 2 Pack Rat, so we overruled him. Just kidding. Kind of. Fuck Karma.

Grade: B. Upon further reflection I think it is definitely a close call. Also an important note was that his land didn’t make red.

TURN 8



With zero cards in hand. Untap. Upkeep. Efro draws his card.

Looks at LSV.

Cheon  ~ “We have to attack or eventually his rat will get us”.

Lucas -  “100% they drew weapon surge”.


Obviously we go into the tank.

8a) Could this be a bluff?
Very unlikely. If we didn’t have cancel we are essentially forced to make two rats and quad block. This goes very poorly for them if they don’t have anything (the board becomes our 3 rats versus their paladin + one token). Its worse then just sending Paladin probably. And since we are dead, we can’t really afford to play around anything.

This just reinforces the Weapon Surge read. I would like to think they give us enough credit to realize that bluff here doesn’t work. On the other hand the way they Hollywooded before attacking is a signal they aren’t giving us too much credit.

8b) Then what Sherlock?
Well we have to make a dude because we are dead without 3 blockers.

Lucas – “First things first, make a third rat”.

Sometimes you need to be precise. To be honest, I hadn’t even thought about what to discard. It was obvious to me that we needed the first rat, and I wanted to take an action to buy more thinking time. 

Except we needed to think first, because what we discard is important.

Unfortunately we discarded Deathcult Rogue.

8c) Can we make 4 guys and block?
Not if we actually believe he has weapon surge since he can plague wind us.

8d) What happens if we block only 3 guys?
He can weapon surge or use 2 abilities from Paladin, but not both.

If he decides to surge, then we cast Cancel, he makes paladin a 4/2. That ends with us at 1 life facing a token. Him with zero cards, but we would have CotN and Deathcult Rogue. Pretty good spot.

Except we discarded Deathcult Rogue. So we would have Island and CotN When he attacks with Token we have to chump with token. And it’s a topdeck war with us at 1 life. He has a cluestone he can crack to find an extra card as well. That isn’t great for us.

If he doesn’t cast weapon surge and instead makes a 4/2 first strike we can make another rat. He loses a goblin token and a Viashino Firstblade. We are at 1 life, but have 3 rats. Even better then above.

8e) Ok so, assuming he players correct (and weapon surges), what do we do now that have discarded Deathcult Rogue?

Then the doubt creeps in.

What made me so sure he had drawn weapon surge? Obviously a snap read is based on intuition but if you put a gun to my head how sure would I have been really? 70%? 90%? How likely are we to win the games where he is actually bluffing, and we just call?

Some people would tell you the pressure was overwhelming or they felt the world on their shoulders. But it was nothing so dramatic. My lucky history in Magic has given me wealth of experience on being embarrassed during feature matches.

Instead I gave my team “the speech”.

Lucas: “We fucked up. We are probably 10-20% to win if play around my initial read. We are close to 100% to win if we don’t play around and my read was wrong.

What do you guys want to do?”

Them: YOLO.

Oddly enough this seems to primarily be the refrain of those in the process of committing suicide.

We would be no exception.

Final Thoughts:
They had the weapon surge. We lost.

We played a game for 8 turns (7 on our side) and made at least 3 mistakes, 2 of which may have cost us the match.

We played a turn 2 pack rat and lost.

Because we made the perfect read against one of the best teams in the world, we had a chance to win even when they were drawing pretty well.

Its unfortunate that Magic chose that moment in time to be a skill game.

FIN.

Friday, 5 July 2013

HoF Voting. Quant Style.


I don’t want to get into an argument about the use of intangibles or subjective achievements (penalties as well) for use in HoF voting. This is just going to be a simple explanation and presentation of a mathematical approach to judging deservedness. Its not necessarily how I would vote exactly, but I think its a more a honest method then most.

1. The first cut.
Due to time constraints (aka lazy constraints) I only considered players with 5 top 16s or more. The cut is somewhat arbitrary, but it left me with 25 considerations and it seems like below that number you would have to rely on subjective arguments anyways (e.g. Pikula and Herberholz’s of the world).

Once I did this, I did all analysis WITHOUT names attached, to remove as much bias (during the methodology creation) as possible.

2. The meat of the method.
I created 5 super categories. Each one has multiple components. Then I gave a weight to each of these categories. This creates an overall score for each player and the top 5 scores were reported. I like this methodology since it can tell you where a player has a deficit or strength. If you disagree with my weights it simple enough to see how the rankings change based on your own personal preferences. For example, Lauren Lee doesn’t think consistency should matter whereas my friend Sam seemed to think it much more important.

Below I list the 5 categories, the weight assigned to each, an example of a subcomponent and some discussion of players who excel or fail in the category. Finally I might add some color commentary.

Note some weights changed since I posted on facebook based on discussions with people I respect.

Longevity (10%):  
How long was the player at a high level of magic. The simplest subcomponent is # of PTs played.

Top 3 (always in order)
Ikeda, Yasooka, Stark

Bottom 3 (no order unless mentioned)
Krempels, Justice, Soh/Kaji.

This is one the places where Justice gets really punished. If you put little weight on Longevity, I think its hard to argue that he shouldn’t get in.

Consistency (15%):
Was the person consistent at the highest level of play. PT Median Finish is here. Note this is somewhat independent of how long the player played.

Top 3)
LSV, Efro, Osyp/Mori

Bottom 3)
Jurkovic, Tiago Chan, Geoffrey Siron

I am not sure that consistency should be that important. If someone was bad early on in their PT career, but became dominant I could fully imagine they belong in the HoF.

Best in World (25%):
Could we consider the player amongst the best in the world during some extended period of time. I think its hard to argue that someone is among the best of all time if you can’t even provide evidence that they were the best during their time. 3 YR Median is one of the subcomponents.

Top 3)
LSV (Get used to this), Saito, Wafo-Tapa

Bottom 3)
Reitzl (Booooo), Ikeda (first good argument for why he shouldn’t be in), Jurkovic

Place in History (25%): How unusual is their resume? Do they have something that really stands out, makes you say “Wow that would be hard to do”. How many standard deviations above the mean are their stats.

Top 3)
LSV, Saito, Yasooka (15 Gp Top 8s, 16th on Money List, insanely high pro points)

Bottom 3)
Krempels (no idea who he is for good reason?),  Tiago Chan, Justice (0 GP Top 8s, almost no pro points).

Again this tries to quantify place in history, I know lots of people would argue Justice should be higher but I need a data point and I don’t have one.

Skill (25%): This is probably the most controversial category, I think even if you had low skill and had results from the above categories you might deserve to be in. %Top 16s is an example of skill.

Top 3)
Justice, LSV (now I wanna see a Justice/LSV grudge match), Efro

Bottom 3)
Tiago Chan, Ikeda, Fabiano

4th was a tie between Johns/Kaji.
The Final Top 10 (with scores and lower is better):
  1. LSV (1.65)
  2. Saito (6.4)
  3. Yasooka (7.55)
  4. Efro (7.85)
  5. Osyp (8.1)
  6. Gary (8.1)
  7. Stark (8.45)
  8. Wafo-Tapa (8.85)
  9. Mori (8.9)
  10.  Johns (9.05)

What do the top 10 have to do make a move into top 5:

Gary – literally anything to break the tie with Osyp.

Stark – Scores worst in Skill (low Median) and BiW (low 3 Year Median), which I am sure many would disagree with. Honestly just weighting those two areas slightly lower and hes in.

Wafo-Tapa – Consistency and Longevity were is two weakest areas and the ban definitely didn’t help that.

Mori – Skill and BiW need improvement.

Johns – Longevity is far and away his worst score. Hard Time to start PTQing IMO.

Justice – If you put 0 weight on Longevity/Place in History, Justice becomes a slam dunk candidate (2nd to LSV, with Saito falling to 8th in this case).

Monday, 17 June 2013

Welcome to Vegas

Welcome to Las Vegas

Vegas is going to be the largest magic tournament ever. And its going to be the largest by a big margin. Which makes it an interesting exercise to try and figure out the implications for how that effects records and the cut to top 8.

For those who read my facebook notes, you will have seen my previous attempt to make a guess at what records would top 4 the Player’s Championship. Something I nailed with reasonable accuracy. The methodology is fairly simple. I assume everyone is 50/50 in every matchup and then simulate it a crapload of times. Draws are not allowed.

There were 3 problems adapting this framework to Vegas:

  • How to incorporate byes.

I figured Vegas would have about 4000 people and a large number of those would have some number of byes. The original program isn’t designed to handle that, so I figured I would compensate by just setting the number of participants at 5000.

  • Time to Run

The initial version was pretty slow, which wasn’t a big deal when I had to simulate a 12 round tournament with 16 people. I had to make some pretty big adjustments to speed it up for a 15 round 5000 person tournament. I don’t think I made any errors when making these adjustments, but who knows.

  • Trials.

I am down to about 100 trials because of how long this thing takes. The information regarding 12–3 players is based on 10 trials.

Results

  1. The record needed to top 8 will be 13–2. In all 100 trials 8th place was 13–2.

  2. Between 17–20 people end the tournament with that record or better. The average was 18.72. So on average 10 people missed the cut at 13–2.

  3. An average of 87.8 people were 12–3 or better. Which means 20+ people were missing the money with a 12–3 record. The first GP I ever travelled to (GerryT winning in Denver) that record was a lock for t8. This actually makes me suspicious that I did something wrong (because the result is so ridiculous), but I haven’t found what it could be yet.

Practical Implications

  1. You can drop at X–4.

  2. Draws are much better then they would otherwise be, especially late on day 1.

    For example, assume you are 7-1 going into the last round of Day 1. A draw is going to be significantly better than a loss here assuming you care about t8 only. With either a draw or a loss you have to win out to have a shot at Top8ing. But with a draw you are 100% to top 8 assuming you win out. With a loss you are almost 100% eliminated from top 8.
    
  3. If you are 7–2 on day 1, your odds of top 8ing even if you win out are essentially zero.

Breakers are going to a matter a lot for the top 8 cut, so losing early is costly (See above). Though you can still obviously qualify for dublin.


Sunday, 31 March 2013

The Fog of War


Decklist


4 Breeding Pool
4 Temple Garden
2 Hallowed Fountain
1 Overgrown Tomb
1 Watery Grave
3 Sunpetal Grove
4 Glacial Fortress
4 Hinterland Harbor
1 Alchemist's Refuge
1 Nephalia Drownyard


3 Augur of Bolas
2 Snapcaster Mage

1 Gideon Champion of Justice
1 Jace Architect of Thought
1 Jace, Memory Adept
2 Tamiyo the Moon Sage
1 Garruk Wildspeaker

4 Fog
1 Clinging Mists
2 Feeling of Dread
4 Supreme Verdict
1 Terminus
2 Azorius Charm
1 Selesnya Charm
3 Sphinx's Revelation
2 Urban Evolution
4 Farseek

SB:
1 Nephalia Drownyard
2 Loxodon Smiter
2 Thragtusk
1 Pithing Needle
2 Witchbane Orb
3 Dissipate
1 Dispel
1 Oblivion Ring
1 Detention Sphere
1 Curse of Echoes

Selesnya Charm - You need a way to kill obzeday (its basically the only relevant thing Junk Rites can do against you). It would be better if we could find an answer that hits Falkenrath as well.

Gideon - The best win condition against Junk Rites. Also good against any lingering souls deck.

Urban Evolution - Better then the 4th Revelation because its slightly less clunky early. Also going 5 into fog is very common.


Strengths:

Junk Rites is a Bye.
Aggressive Red Decks are positive. Especially if they don't have skullcrack.
Esper Control/UWR are at all time lows.
Often has decent matchups against the niche crap which people bring to beat Reanimator because they aren't really super interactive decks (Hexblade etc....)


Weaknesses:

Time Management.
Softness to blue cards especially when backed up by a clock.
Easy to hate (for example it would be easy to build a jund deck which just kills this).

The deck isn't easy to play. Be very conscious of mana efficiency. Its often correct to Snap -> Fog before casting a fog in your hand, to make a revelation turn better. You need to be very precise technically (which is normally easy) but you also have to be super fast, since most games takes 10+ minutes even when you win.

Friday, 1 February 2013

Simulations Part 2: Yuya, Jund v2, Bannings and PT Nagoya


Summary from Last time

I think there was a misunderstanding about what I was trying to do with my last post.

To be clear: If you could guess the metagame and win percentage matrix perfectly you would know the best deck.

But this is obviously both unlikely and extremely costly time wise. Instead I want to use the math, programming and examples to challenge commonly held beliefs of the “pro” community which may or may not be true. All of us rationalize deck choice and it is useful for us to try and at least take an analytical lens to these arguments. So I wanted to summarize the practical advice that applied from my last article:
  • Don’t play the most popular deck if its bad (even if you are good with it). The playskill edge you need to make this worth it is large. See later section for details.
  • If you want to top 8 (or “do well”) focus on beating the most popular deck.
    • But a complete glass cannon doesn’t work either. See example: affinity/scapeshift/tron.
  • If you want to win, focus on (e.g. a PTQ) don’t worry about beating the most popular deck, worry about beating the decks that beat the most popular deck.
    • Top 8ing and Winning can require distinct deck selection considerations. See RPS example from previous post.
  • Just looking at what percent of the metagame (even a winner’s metagame ala Karsten/Chapin) is not a good indicator of what the best deck is and even more unintuitively it may not be a good guide as to what you should be focusing on beating.

This Week: Motherfucking Science!!!?#$

  1. Addressing reader comments on the previous article.
  2. How much is being Yuya worth (other than 57 pro points)?
  3. What happened to Jund.
  4. Theory as applied to PT Nagoya
  5. Theory vs Simulations. Math! Proofs! Almost Rigorous!
  6. 5 minute break to relieve your boner from the last section.
  7. Conclusions

1. Comments from last time

I would like to take a moment and genuinely thank everyone who made comments on the previous post. Most of it was on my facebook wall and it was cool to see how many people enjoyed a slightly different approach to Magical analysis. I appreciate every single comment and will try to address some of the points here.

Thanks to Paul Jordan who did some more analysis and hooked me up with some excel so that I could format things more easily.

Do More Trials Short answer I think this is a non issue. I am not sure why 1000 trials is not enough. The distributions I am using aren’t exotic enough for me to think they warrant it and my code is unbearably slow (1000 trials is already an overnight process). That being said Jarvis has been the sent the code and may be able to optimize it.

What happens now that Jund lost BBE? To be answered in a third and final post hopefully next week. I also want to address the # of rounds importance. But I imagine it will require a subsequent post, because people only want to read so much boring math. Eli Priest already had the gist if you read his facebook post.

What about writing for a major website? Unfortunately not possible right now. I really appreciate everyone who shared and retweeted the link for this blog since I don’t have reach any other way. Special thanks to Sperling (who tweeted) and whoever posted it on Reddit (4000 views from Reddit and 400 from MTGSalvation forums).

What about Model/Metagame uncertainty? Is this useful? Again this isn’t a tool for predicting the tournament exactly ex-ante. Rather a tool that helps us analyze “How We Should Think”.

Top 8 is 3 of 5 did you account for this? No. In a related point one reader thought I would systematically underestimate a top 8 deck winning (conditional on having top 8ed). Empirically that might be true. And the 3 of 5 might have to do with that. A deck with a great sb will get an edge in the top 8, rendering the win percentage matrix not constant throughout the tournament. Its a rather trivial addition to my program to correct for this but I am not sure how I would get the correct assumptions. Moreover many tournaments rely on 2 of 3 in the top 8 (GPs, PTQs, FNM etc..)

Pro Tours have limited Nothing much I can do about that other than assume it has zero impact or flip coins for every limited match. I have asked Paul Jordan to look into how Jund players did during limited rounds during PT RTR (to get a sense of how much it matters), but its a ton of work I imagine.

What about variations in player skill and deck construction See the section on what being Yuya is worth. Unfortunately I can only use Paul Jordan’s categorizations of decks. And he needs to aggregate disparate lists to get meaningful sample size on deck win percentages.

Is there anything else more practical we can do? Some ideas I have had:
  • How much can tie breakers actually move at the end of a X round tournament (Sorry Conley!).
  • Is the MODO metagame rational? Are there “sticky” deck choices (for all you economists out there). Whats the time-lag for information processing? Obviously you could check the IRL metagame as well.
  • Is Yuya a robot?
  • Prices. A long time ago I sent an involved article to Channelfireball about card prices, I am not sure exactly what happened to it. If there is demand for this kind of thing, I would consider trying to find it or redoing it. Essentially I wanted to mythbuster magic finance.

2. Why I would rather be Ari Lax than Yuya Watanabe.

From this point on I will be using theoretical probability tools as well as simulations . For a detailed discussion of why this might matter check out section 5. Otherwise take my word for it that the theory is sound.

We can measure how good a deck is in a given round by calculating its Expected Winning Percentage. Imagine Yuya has a 10% higher win percentage in every matchup (including the mirror). Thats a pretty substantial edge, especially at the Pro Tour level.

Chart 1.


 How do we make sense of this? In round 1 Yuya has the best win percentage of everyone. Yet in round 10, the value of being him with Jund is worse than being an average player with Poison, Eggs or Tron.

If we stop and think this is just a simple corollary of the previous post. By the time round 10 gets around all of Junds good matchups have been drastically squeezed and its bad matchups have proliferated. This happens because of the popularity of Jund. So Jund’s win percentage at the top tables is in constant decline. Yuya still has his 10% edge, but it isn’t enough to overcome his deck selection disadvantage (theoretically anyways, since obviously he top 8s and thus implodes math).

3. But can we explain what happened after PT RTR - Jund Edition

I think its fairly clear to most of us that the Pre-PT Jund builds were often inferior to what would become the best version of the deck. Deathrite, Liliana and Lingering souls weren’t even mainstays at that point. As a proxy for how the season developed I reran the simulation for PT RTR, but gave all jund players a 5% bump in every non mirror match. How did that change things:

Chart 2.


Note if the improvements over the course of the last 4 months were even larger its reasonable to see how Jund might have won 75% of the GPs. But a large part of its dominance would still be due to its initial meta size. The “improved” Jund from this case only wins ~52% of its matches. If the newest versions “solve” the affinity matchup it wouldn’t up their win percentage by that much but would of changed their win tournament percentage to the ~35% range.

If an extremely popular deck has a positive expected win percentage (even if that edge is small), it will post DOMINANT results

I wonder if there is some kind of psychological feedback mechanism in play at this point. The deck wins so people play it. But people playing it means it wins. Thus a deck seems dominant when in reality it would be perfect rational to play a host of other reasonable choices. #ThinkingCapsOn

I don’t want this blog post to get sidetracked, but I think in the wake of the B&R announcement, its easy to see how wizards might have made a rational overreaction. Banning might have been needed to break up the cultural inertia that had built up behind Jund. The metagame was stale not due to Jund’s dominance but because of its inertia. Bans are a way to encourage diversity by changing peoples perceptions (they think Jund is now as bad as it actually already was), but not the reality.

Let me know if this makes sense. Summarizing:
  • Jund is actually not a great deck (~53% with some bad matchups)
  • But people think its great (>60%) so a lot of them play it
  • The combination leads to a lot of success kind of like 10,000 monkeys on 10,000 typewriters. This reinforces the erroneous beliefs.
  • Wizards bans BBE which has zero impact on the actual viability of Jund but makes people adjust their beliefs regarding its power.
  • Now that its perceived power is equal to its actual power, people again begin trying alternatives.
  • Thus the metagame becomes more diverse.
  • If people were completely rational they would of tried new things even without a ban. But we needed a shock to a system because of incorrect perceptions/metagame inertia/some other reason.
Realistically Jund was probably overperforming too much for the above to be true, but I think its in the realm of possibility.

4. PT Nagoya

Per PV’s suggestion I thought Nagoya would be an interesting second case because the popular deck was actually very good. As of this very moment the Simulation for the PT is running but I would like to present my estimates based on theory for similar metrics to the last post. If the simulation ends up being drastically different than my predictions it will be reported.

In this case I am much less confident about how I filled in the win percentage matrix since I never played in the block format. If someone good wants to double check that for me shoot me a PM or comment. I also don’t have Infect or Tezzeret variations separated out.

Chart 3.




In this case I think intuition lines up much better with the results. The three best decks in terms of overall win percentages also top 8 the most. The two non-Tempered Steel decks with the best Tempered Steel matchup are the best decks for both top 8ing and winning the tournament. We can take away:
  • If the popular deck is good. Its a fine play. Especially if you want to top 8 (as opposed to needing to win).
    • If you personally had the a good mirror match than the deck becomes a very good choice. Unlike the previous Yuya example, there is no adverse selection in the metagame your bad matchups don’t get more popular.
  • Beating the most popular deck is much more important if the deck is good. This seems to be independent of your goal (Top 8 v Win) in this case.
  • If the popular deck is good, the number of viable decks is probably much smaller than when the most popular deck is bad (duh?).

5. Theory vs Simulations. Math! Proofs! Almost Rigorous!

Estimating the results by theory has a couple of huge advantages. The disadvantage is that I have to make even more assumptions. The advantage is mostly to due with speed and being able to adjust parameters instantly for instant results.

A comparison of results for the original PT RTR example. Simulation vs my Theoretical results. Note for the top 8% theoretical I am using the theoretical metagame of X–1s or better. This obviously isn’t exactly equal to the top 8.

Chart 4.


The results are very close for the top 8. And kind of close for the Win %. Not sure if thats because the simulation has noise, or the latter theoretical numbers are overburdened by the assumptions. Either way I am pretty comfortable pending the results of the Nagoya simulation.

6. Take a minute to please tweet this post. You can include me @toordeforce or not. Also feel free to share on fbook.

Don’t worry you can alt-tab. I’ll still be here.

7. Conclusion 

Obviously we are just barely scratching the surface of whats possible here. I hope to do one last follow up post on simulating metagames and then move on to other things (possibly one of the questions mentioned previously).

Monday, 28 January 2013

Simulating Metagame Evolution and PT Return to Ravnica

The tl;dr

Generally:
  • Beating the most popular deck is overrated
  • If your goal is winning the tournament (as opposed to top 8ing or doing well generically) then the optimal strategy may be very different.
For PT: RTR
  • Eggs was the best deck in round 1.
  • Poison was the best deck in round 10 of modern.
  • Scapeshift, Robots and UW were worse then you know

Table of Contents

  1. Introduction
  2. The Rock Paper Scissors Example (Mascoli 2012)
  3. PT: RTR
    3.1 Relevant Assumptions
    3.2 Simulation Results
    3.3 Theoretical Results (to be updated at a later date)
    3.4 Final Notes

1. Introduction

Ask ten pros and you will get ten answers on how and whether you should metagame. However, almost all of us (or if I choose to keep it real “them”) are answering mostly based on experience and intuition. Until recently I thought that was fine. The truth is far more interesting.

Chris Mascoli (check Gatheringmagic.com) recently published an article on metagaming and magic which was the entire impetus for the work I have done here. In the comments surrounding the article Mike Flores was credited with coming up with some ancient work which originally exposited on the idea. I hope to build on what they have started.

I am going to try and give more explanation, do more theoretical work (as opposed to pure simulations) and finally apply it to the most recent pro tour. The conclusions are hopefully interesting and unobvious enough to be worth devoting a mammoth post to. The program I used to simulate the results was created completely independent from Chris’ work and this conveniently provides me a way to test the validity of the program (assuming Chris’ work was also correct).

I summarize the key insights in Metagame Rules that are bolded below.

2. Advanced Rock Paper Scissors

Consider a rock paper scissors tournament where you pick one strategy and must play option every round. The tournament is run using with the standard swiss rules and a mirror match is 50/50. What is the optimal deck if the tournament featured 300 players and 8 rounds with the following metagame:
  • 40% Rock
  • 33% Scissors
  • 27% paper
Chart 1. Metagame Shares


Original Metagame

% of Top 8 (Chris)
% of Top 8 (Lucas)
Win % (Chris)
Win % (Lucas)
Rock
40.00%

13.57%
14.85%

13.33%
14.10%
Paper
26.67%

55.43%
53.90%

16.36%
17.70%
Scissors
33.33%

31.01%
31.25%

70.31%
68.20%


I have included both the result of my simulation (1000 trials) and Chris’ so that we can gain a little confidence that the algorithm is working (admittedly using some assumptions). Note the use of 8 rounds (which is incorrect) was a small oversight by Chris but since the example still provides the intuition we are looking for I decided to run with it.
The key results are that:
  • Paper is the best deck for top8ing
  • Scissor is the play if you want to win
Metagame Rule 1: A popular but poorly situated deck will see its metagame share over the course of the tournament. The top situated decks start to become over-represented relative to initial popularity. The metagame evolves.

Metagame Rule 2: This significantly impacts who top 8s (and as a corollary who wins). Beating the most popular deck is good for getting a top 8. But to win you want to beat the well situated deck. Winning and doing well (defined as top 8) should be treated as different goals.

This chart shows what the above percentages mean from the perspective a specific individual. For example, if I told you the result was that Paper and Rock both had 33% of the top 8 you might think there was no advantage to picking one deck. But, there were different starting positions for the two strategies. While we might expect there to be 2.66 players for both strategies in the top 8, there were more people who started with rock than paper, thus showing up with rock is worse than paper for each individual rock player.

Chart 2. Player Value
Representative Tournament
Original Metagame (# Players)
# Players in Top 8 (Lucas)
# of Winners
Rock
120.00
1.20
0.14
Paper
80.00
4.32
0.18
Scissors
100.00
2.48
0.68

In this case we can see that although roughly an equal number of tournaments are won by Paper and Rock, you personally are much more likely to win if you play paper (since fewer of them exist).

Metagame Rule 3: Whether a deck is good, is not dependent on how much of the top 8 it expects to be, but rather whether or not the deck is increasing its metagame percentage from the initial position and by how much it does so. You would rather be 5% of the initial meta and 10% of winners, than part of a deck that was both 50% of the Meta and 50% of winners. This is important for when we analyze the actual Pro Tour.

3. Pro Tour: Return to Ravnica

Assumptions for Simulation
  • There were 382 players who participated in every round
  • The tournaments was 10 continuous rounds of Modern and had no limited portion
  • The win percentage matrix (see below)
  • The metagame consisted of all decks with an initial metagame share greater than 1.5% and all other decks are lumped into other
The Win Percentage Matrix is a table of every deck which desrcibes the probability that the deck on the Y-Axis beats the deck on the X-Axis. For Rock-Paper-Scissors it looks like:

Chart 3. Win Percentages RPS
Win Probability
Rock
Paper 
Scissors
Rock
50%
0%
100%
Paper
100%
50%
0%
Scissors
0%
100%
50%

For PT RTR it looks like:
Chart 4. Win Percentages Modern



Some of this was filled out using Paul Jordan’s metagame article and some of it was just my subjective best guess. I tried two different versions. In version one I used my best guess for rough win percentages. In version 2 (the one from above), I tweaked version one until the probability of winning a random match was equal (or close) to their actual total win percentage at the Pro Tour (again from Paul’s article). The big difference comes in how the decks are treated when they play the nebulous “other” decks. We can see based on these assumptions the overall win percentage and how it compares to what actually happened. This is simply a sanity check.

Chart 5. Comparing Assumed (based on above) vs Actual Win percentages



Notes from the previous 2 charts
  • Affinity and Scapeshift have very good matchups against Jund, but not great matchups elsewhere
  • The two most popular decks have below average win percentages (Jund and Other)
  • Eggs has pretty much universally favorable matchups
  • Poison has a high overall win percentage but a poor Jund matchup
Simulation Results This shows us how we should expect the tournament to shake out in terms of composition.

Chart 6.  Metagame and Top 8 Shares

The following gives us an idea of what the best decks are. The added value measure calculates your advantage compared to assuming that each person is exactly equally likely to top 8 (or win) the tournament. 100% means you are twice as likely as someone with no advantage or disadvantage from deck choice to top 8. −50% means you are half as likely. The probabilities are the probability that a given individual would accomplish top 8 (or win). In other words if I chose to play Jund at PT RTR I was giving myself .1% chance of winning which is distinct from Jund having an 11.8% chance of winning overall.

Chart 7 Finding the Best Deck.


My Takeaways
  1. Poison wasn’t hurt much by its bad Jund matchup. The rest of the field slowly whittled Jund down and Poison was able to prey on the decks that were doing that (Scapeshift, Tron, Eggs etc.).
  2. Affinity was hurt because its main source of +EV is dissapearing in the later rounds. It also starts to form a large part of the meta as the tournament evolved and thus faced increasingly frequent mirror matches (making it harder for individual pilots to succeed).
  3. Only 15% of decks significantly improved upon the benchmark for the purposes of top 8ing.
  4. Edit: 01/29 Eggs was the second best deck choice for the tournament. But it was only the 4th most likely deck to win the tournament, of course this ignores the possibility that Cifka's list was better than generic eggs.
  5. The fact there was so much Jund in the t8 was probably due to above average limited performances (Ochoa 15, Edel 12, Yuya 15)
  6. Storm’s Performance (which was fairly solid) suggest a lack of “average” storm players. More likely were some people with very good decks and some people with very bad versions.
Metagame Rule 4: In this environment most decks are bad choices. Arguably 85% of decks were below average choices. If the most popular deck is not the “best deck” the metagame decision is very important.

Metagame Rule 5: Being better with your deck (as opposed to trying to metagame) is better when the tournaments are less rounds. Imagine being a rock player with a 20% win percentage against paper and a 70% win percentage in the mirror. You still wouldn’t want to be in the average top 8 from the first example. In general we probably overestimate the value of being good with your pet deck. This will fall out of the theoretical work I plan to examine later.

Final Thoughts

I am going to add some further analysis based on theory to answer some hypotheticals that some may find interesting such as:
  • How much is being Yuya worth?
  • Does number of rounds matter a little or a lot?
  • Does the theory (which requires even more simplification) agree with the Simulations (spoiler: Science works)?
I would also like to run some numbers on another metagame where the “best deck” was actually the most popular deck. In what I expect will be the mother of all plot twists, I assume the math will say to play the best deck. Not sure where to find that stuff so I will do some digging. Maybe PT: Tempered Steel or Cawblade.

Monday, 14 January 2013

Modern UW Control

List as of PTQ:
2 Scalding Tarn
2 Arid Mesa
4 Celestial Collonade
3 Seachrome Coast
1 Calciform Pools
1 Plains
4 Tectonic Edge
4 Hallowed Fountain
5 Island

3 Vendillion Clique
1 Snapcaster Mage
1 Restoration Angel
1 Consecrated Sphinx

1 Ratchet Bomb
2 Vedalken Shackles
1 Batterskull
1 Relic of Progenitus
2 Thirst for Knowledge
3 Supreme Verdict
2 Sphinx's REvelation
3 Path to Exile
3 Mana Leak
1 Remand
4 Cryptic Command
4 Spell Snare

SB:
4 Meddling Mage
2 Celestial Purge
1 Ratchet Bomb
1 Disenchant
1 Stony Silence
1 Negate
1 Spell Pierce
1 Path to Exile
1 Wurmcoil Engine
1 Rest in Peace
1 Linvala Keeper of Silence

MVPs:
Ratchet Bomb
Calciform pool
Vedalken Shackles

LVPs
C-Sphinx
Stony Silence
Path to Exile

Matchups
Twin 3-1 (loss was to RUG with Blood moons)
Poison 1-0
MonoWhite 1-0
RUG Delver 1-0
Storm 2-0
American Midrange 0-1
Jund 1-0

I Apologize for all spelling and grammar mistakes which are sure to follow. They are my mistakes and exclusively the falt of the Canadian Public School system.

I haven't tested the deck much because, I brewed it up mostly 1 hour before the PTQ. But I think all those matchups are things I am very happy to play.

The Genesis:
Early in the week I tried Jarvis and Fabiano's list for a couple of matches and wasn't happy with UW in general. I was losing in 2-mans to competent Jund (Cheon and Orsini Jones) and with the more creature heavy lists I felt only even against combo (as opposed to way ahead).

All the Wraths
Saturday evening I quickly 0-2ed with Zoo (went X-2 at last PTQ) and dropped. Waking up sunday I reviewed the decks at the top tables and realized everyone was playing creatures. Thus I decided I was going to play a bunch of insane cards against creatures. Shackles, Supreme Verdicts and Revelations. This combination of cards is almost impossible for any fair deck to play around (Jund does it the best). Unfortunately you can't really afford to play creatures when your on this strategy so you need ways to no-questions-asked close out the game (thus my choice of BOOMBOOM finishers).

Combo Me? Combo you!
I didn't want to play too many creatures (combo with wrath) and I needed to shore up combo matchups since I had many bricks so I played a crap ton of counters and a maindeck relic. My few early threats were also highly disruptive which helps since you often draw so many prices. V-Clique is also excellent with my boom booms since it easily strips their answer as you curve into your 6 (or 5). Thus I played Sphinx over Sun Titan since I felt I would be able to make it stick more often than previously. The one remand (#whatwouldOriedo?) is NOT an attempt to be cute. I actually think its worse then mana leak (especially with thirst in your deck), but I wanted to play two cantrips because I suspected 26 lands might be low.

Nice Spell Snares Bro
To alleviate the problem of drawing the wrong part of your deck in the wrong matchups i had the Thirst for Knowledge's. Where other people played Jace as a source of persistent card advatange I wanted to play almost exclusively on their turn and use my three drop card drawer to filter for the insane one ofs that were pertinent in each matchup. They are also excellent with situational sideboard cards. For example spell pierce is generally mediocre in the twin matchup. Early on you can just lose to blood moon (so it excels there), but the majority of games go long. So the card is pretty bad usually. Similarly you might want to bring in disenchant against RUG delver (in case of shackles or moon) but you don't want it rotting in your hand forever.

The only way Pikula Makes a top 8*.
Meddling mage is the allstar of this board. In every combo matchup you end up post board with essentially 10 disruptive creatures, 12 counterpells, 4-card drawers and some miscellaneous permanent hate. I like 1 Rest in Peace since its insane against storm (impossible for them to grapeshot you and this deck almost never loses to EtW) and I always bring in one against Jund. Stony silence is obviously #synergy with my shackles, wrath, revelations anti-affinity plan. I think the worst card in the board is probably the 4th path but its almost certainly a necessary evil. Without snapcasters you don't actually have very much removal.

Mana from the gods
The whole point of the deck as to make shackles work. Thus the 3 seachromes. I am also petrified of blood moon so I made the unusual choice to run 2 Arid Mesas. They usually cost you extra life but I like the insurance. The calciform pools was busy firing off revelations all day and I hated having 6 colorless sources essentially but it was probably the best one. The question isn't 25 or 26 lands. Its 26 or 27?

*this is a joke and Pikula was actually within a match in the Sat PTQ I believe (assuming his MODO nick is the obvious). But rather than be whined at for lack of respect (#whathaveyoudoneetc), I figured I would put an asterisk and save myself some flame-mail. Because words mean things.