toordeforce: July 2013

Monday, 29 July 2013

Final Thoughts on the HoF and the Skill Paradox

A final thought on the HoF.

PT Top 8s in the modern era are worth less than PT Top 8s from earlier in magic’s history. This is in spite of the average modern player being much more skilled.

**What I am writing about is an adaptation of well known theory in investing, Sabremetrics and Poker. For those interested in a more detailed analysis I suggest Mauboussin and googling the ‘Skill Paradox.’.

The Paradox of Skill

We start with the fairly simple assumption that

Performance = Skill + Luck

We also assume each person’s luck is drawn each tournament from some distribution that is equal to all players. E.g. LSV might have been be luckier in PT Kyoto than Nassif because he opened Nicol Bolas at that specific tournament, however they had equal chances to open it.

Because a person’s skill and luck are uncorrelated, we arrive at the Paradox of Skill:

Variance of Population Performance =

Variance of Skill in population + Variance of Luck in Population

As the Variance in skill gets smaller, the variance associated with luck starts to dominate in determining the overall outcome of tournaments.

Consider the following example:

A) A PT in 1999, where Jon Finkel is far and away the best player. The 100^th best player barely knows how to draft and the rest of population is somewhere in between.

B) A PT in 2013 where, the top 100 players are all equal to skill Jon Finkel @1999 skill level.

It should be clear that someone’s final position in PT A is strongly correlated with skill. In other words we can be confident that the person in 8^th was better than the person in 16^th.

In PT B, the opposite would be true. The only difference between someone who gets 8^th and 15^th was the amount of luck they had in that specific tournament.

Assumptions I am using:

1) The skill dispersion (especially at the top of the game) is much lower today than it was historically. In other words, the top 50 players in the game are much closer today (even if they are all much better) than they were historically.

And that’s it. Everything I have read from Kai, Finkel and Kibler on the topic would seem to support the view, but I haven’t bothered to try and prove the above assumption.

Just to reinforce that this situation isn’t completely impossible. In a world where the top 50 players attend 3 PTs a year and each have 10% chance to top8 a PT: we would still expect one person to top 8 two PTs a year. In other words the fact that some players do consistently well isn't enough to disprove assumption 1. If you have ever heard or read about the birthday paradox, the same principles apply.

Practical Implications

We are seriously overweighting T8s and wins in the modern era competitors. Instead we should focus on a looser metric (e.g. 32s/64s etc…). Rate metrics and consistency become much more important. For older players, top 8s are more likely to imply that they were one of the best 8 players in the tournament. And a top 32 is more likely to imply that they were NOT one of the top 8.

Recently in his SCG article Reid Duke made the point we shouldn’t punish anyone for having a few bad initial years on the PT. And I really wanted this to be true (Because me obv). But if we now know that luck is the major determinant in people’s short term success rates, things like 3 Yr medians should mean less for modern competitors. Forgiving a few “bad years” makes it more likely you select someone who's results are variance driven (as opposed to skill).

Putting this together for HoF implications I think should go as follows. Suppose someone has 2 PT Top 8s and 6 PT top 16s. In the Modern Age: I think “Hes unlucky”. If he is old school: I think “he probably wasn’t that good”.

Focusing on results through this lens I think we could argue:
Underrated (in no particular order):

1) Shouta Yasooka

2) Hoaen (do we consider him “modern”?)

3) Osyp

Overrated
1) Edel.

2) Saito (if “Modern”)

3) Ikeda

4) Gary

Final Unrelated HoF Thoughts:

Stats I used in my previous formula driven HoF Ballot:
Longevity = # of PTs, # of Pts

Consistency = PT Median, 3 Yr Median, Difference in Medians, T16s, GP Top 8s

Best in World = 3 Yr Median, POYs

Place in History = These are indicator variables (e.g. are you in the top 20%). In other words having 4 PT Top 8s is the same as zero because 80% of ballotees had 4 or less.

Top 8s, Money List, GP Top 8s, Pro Points.

Skill = T16s per PT, Median Finish, POYs per years played.

My Ballot (which I don’t have):

1. LSV

2. Edel + Ikeda

These are the only two who are not top 5 stat wise. I think pioneers in a field deserve credit. I am willing to go beyond the stats if there is proof they did something truly unique. I feel the case for Ikeda is weaker than Edel (he has more similar analogues in Fujita, Oishi etc..). I could be convinced to vote for Osyp (easily the most underrated candidate on the ballot) instead.

3. Shota Yasooka.

Stats + Skill Paradox already implied he was one of the best players skill wise on the ballot. Juza’s interview on cfb was a nice (if unnecessary) confirmation.

4. Saito.

1. He was (or at least top 3) the best deckbuilder in the world for a long period of time. Still seems like he might be.

2. He is one of the best players I have seen play. I can sometimes remember individual matches where I was blow away by the play I saw. Saito in TSP block is amongst those. Ditto San Juan. Most players I have talked to feel that he was easily amongst the best when he played.

3. He was an angle shooter, but a lot of people on the PT are. Stalling in particular seems like one of the most hypocritical things for many players to call someone out on (based on my PT experience). So while he might be the scummiest of successful players (which I doubt), other players are close to that level. This might be too much apologizing for someone who is arguably a cheater (I differentiate between rules lawyers/cheaters/angle shooters), but I don’t believe (based on 1 and 2) that his results were significantly impacted by his angel shooting.

The honourable mentions: BenS, Efro, Gary.

Tuesday, 16 July 2013

One Game

It can be hard to figure out exactly how good you are.

You can play a whole game and make zero interesting decisions.

Or you could spend eight turns finding out you have a long way to go.

Three friends went to GP Providence. We had practiced a lot. We had a history of some success (2^nd at the last Team GP) etc. But this isn’t a feel good tournament report. And it isn’t an appeal for pity.

I have finally found some time (and my notes) to do some honest reflection on facing two (maybe 3) future hall of famers; and then being weighed, being measured and being found wanting. Not a tournament report. Not a match report.

So don't call this a report so much as a story about one game against the best in the world.

Before the 2^nd draft on Day 2, we were 9-3. That’s not the end of the world, but it is not a great place to face Cheon, Froelich and LSV.

I was summarily dispatched by LSV in the middle seat. Jamie beat Paul. Which means it would be Maksym vs Efro for all the marbles. In game 2, we played well and managed to find all the right attacks. It was one of those games, where you didn’t necessarily outplay your opponent. But rather we had managed not to snatch defeat from the jaws of victory.

I think most grinders would know the feeling.

So its time for game 3. The good news is that Maksym’s deck had Aetherling, Pack Rat and Soul Ransom. The bad news is that last year their team had more pro points then our lifetime totals combined. We were fighting the civil war of Ratinum. Efro’s deck was an aggressive boros deck splashing blue for Ral Zarek and Beck//Call. We knew about at least one Weapon Surge and were on the draw for game 3.

Jamie and I do a mental high five when they take their mulligan and we see a rat. At least I think we do. Jamie’s probably too a nice guy to revel in our opponent’s misfortune, but I hate imagining myself on the solo end of a high five.

Grade = A++. Opponent on 6. We have turn 2 rat with 3 lands in hand. Played this part perfectly.

Grade = A. Nothing to screw up. Yet.

There are some small set of scenarios where not playing pack rat is correct (and Maksym broached the topic). However, against an aggressive deck I don’t think you can possibly afford to be that cautious.

Grade: = A. Didn’t punt by not playing rat. No victories are too small for this story.

TURN 3

At 14 life we face our first real decision.

3) Should we spend a turn making rats or play a barrier?

3a) If we make a rat can we afford not to block?

If we don’t block next turn we will be at 10 (if he plays another creature), or 6 if he just double pumps. After that we will have three 3/3s, but his Truefire Paladin is an abyss and his other guys are trading for rats.

3b) So we have to block if we make a rat. What are optimal blocks?
Presumably we would just block firstblade. A trick gets really costly here since we would be at ~10 with one rat facing 2 creatures. And again Truefire is close to abyss mode (assuming a 4^th land).

3c) Whats the goal here?

We have soul ransom (he mulliganed) and tons of gas. So we just want this game to go as long as possible. Which means preserving life even if that means throwing away cards.

Hover Barrier makes the most sense in this context. Its going to be especially good if his 4^th land doesn’t allow for double pumps (e.g. isn’t a mountain). A reasonable guess given his mulligan and being on the play.

Grade: A. Found the important strategy for the game.

TURN 4

Cluestone gives him double pump mana. Truefire gives a way to grow an army that could potentially fight rats. The whole game is going to shit. But he only has one card in hand and we have Soul Ransom. We could also Fatal Fumes here. Millenial gargoyle, call of the Nightwing and making PR#2 all don’t do enough defensively.

4) Should we Fumes or Soul Ransom?

4a) Assume Fumes whats the optimal target?

We can’t afford to let him have guildmage in the long game and Soul Ransom isn’t a permanent answer. So we would have to fumes guildmage. He then attacks with both.

4a – II) If he attacks with both what do we block?

Chumping with rat seems unadvisable (but maybe we should of considered it), so where to put the Barrier is question. Paladin would kill it setting us up with a Pack Rat vs his board of two creatures and being at 10. We would ransom paladin, he would discard two and we would still be at 10 and have to chump with Pack Rat or go to 2. Not a winnable board state.

If we block the Viashino, he pumps twice we go to 6 and Soul Ransom his Truefire. He discards and we put Hover Barrier in front. Leaving us with a rat at 4 life versus his two creatures. Not a winnable board state.

4b) What about Soul Ransom? Optimal Target?

I think its safe to assume he is going to crack the Soul Ransom to get back whatever we take. If he gets it back immediately, taking the Truefire is better since he can’t attack right away and the paladin isn’t useful summoning sick. If he is holding a good card (or draws one) he might wait a turn or two to crack it. In which case taking the guildmage is better. I didn’t want to give him option value (e.g. the ability to draw cards just make dudes), so I suggested we take the Sunhome Guildmage.

Grade: B. Not playing the fatal fumes is good and not an obvious line. In retrospect taking the guildmage might have been bad, since we can always fatal it the next turn if he decides to wait.

TURN 5

After Efro plays Goblin Rally, its obvious hes setting up to get his guildmage next turn.

5) Should we make a play, mainphase fatal fumes or hold up fatal fumes?

5a) Can we afford for him to get guildmage back?
No.

5b) So mainphase or wait for him to discard?

The first question is the interaction between Soul Ransom and Removal. Short answer is we get to draw 2. But, we had to ask a judge to confirm. Luckily LSV seemed to get the wrong read here (based on us asking the judge question). Maybe he assumed we knew basic rules interactions. Joke’s on him.

5b2) What happens if we wait, they figure it out and do nothing?

Well we have pack rat so our mana won’t really be wasted. And they won’t be able to attack. Seems like waiting is fine.

Grade A-: I think we made the right play, but it should have been obvious that we had removal because we had to talk to a judge. A massive leak which better players would avoid.

TURN 6

On his turn 6, efro discards two cards and we respond with fatal fumes. I have listed the 3 cards drawn on our turn 6 (two from Soul Ransom). He still gets to attack his board into our Rat + Barrier. We could also chump with a rat.

6a) Who are we blocking with Hover Barrier?

If we don't block Truefire, we go to two life. We would also be facing 6 creatures, with 4 potential blockers. So we need to block Truefire Paladin.

6b) Should we chump with rat?
We need to start making creatures at this point and Rat can make 2 a turn. Can’t afford to chump block (on Efro’s T6).

For our turn 6 making two rats is the only way to make two blockers and not die. He has 6 attackers and we have 3 blockers during his turn 7.

Grade: A. Made all the right plays, though it is not like there were real decisions.

TURN 7

After he attacks with everything (4 tokens, firstblade, truefire). We make 2 rats going up to 3 total and chump + kill 2 tokens. On our turn we face a bunch of possibilities given that we have 2 rats in play.

7) What are the options?
Plan to make two more rats on his turn (while holding up Cancel). Suppose we make the third rat and block everything but one token. He can either pump (+ first strike) his Paladin or not. If he does we make the 4^th rat going to one but ending up with 3 rats. If he keeps his mana up we can trade boards and have cancel for his threat, followed by a threat. We can beat a burn spell (assuming it costs more than 2 mana) with this line.

Alternatively we can play a land and cast CotN.

7a) Why cast Call of the Nightwing (CotN)? Why Not?

He can’t block the ciphered rat (because we can make a third rat in combat). We end up with 2 bats, 2 rats (1 untapped) and the ability to make 1 more rat, but no Cancel. Our ciphered rat is unlikely to get in again. This is fine if he draws nothing.

However we lose to burn and maybe top decked tricks. We are also lower on cards in hand (because we need to make another land drop_ so in a stalemate we could conceivably lose given his abyss Paladin.

Jamie and I thought we could afford to play around top decks (and hold up cancel).

Maksym wanted to CotN and try and end the game. Maksym was losing a game with turn 2 Pack Rat, so we overruled him. Just kidding. Kind of. Fuck Karma.

Grade: B. Upon further reflection I think it is definitely a close call. Also an important note was that his land didn’t make red.

TURN 8

With zero cards in hand. Untap. Upkeep. Efro draws his card.

Looks at LSV.

Cheon ~ “We have to attack or eventually his rat will get us”.

Lucas - “100% they drew weapon surge”.

Obviously we go into the tank.

8a) Could this be a bluff?

Very unlikely. If we didn’t have cancel we are essentially forced to make two rats and quad block. This goes very poorly for them if they don’t have anything (the board becomes our 3 rats versus their paladin + one token). Its worse then just sending Paladin probably. And since we are dead, we can’t really afford to play around anything.

This just reinforces the Weapon Surge read. I would like to think they give us enough credit to realize that bluff here doesn’t work. On the other hand the way they Hollywooded before attacking is a signal they aren’t giving us too much credit.

8b) Then what Sherlock?

Well we have to make a dude because we are dead without 3 blockers.

Lucas – “First things first, make a third rat”.

Sometimes you need to be precise. To be honest, I hadn’t even thought about what to discard. It was obvious to me that we needed the first rat, and I wanted to take an action to buy more thinking time.

Except we needed to think first, because what we discard is important.

Unfortunately we discarded Deathcult Rogue.

8c) Can we make 4 guys and block?
Not if we actually believe he has weapon surge since he can plague wind us.

8d) What happens if we block only 3 guys?

He can weapon surge or use 2 abilities from Paladin, but not both.

If he decides to surge, then we cast Cancel, he makes paladin a 4/2. That ends with us at 1 life facing a token. Him with zero cards, but we would have CotN and Deathcult Rogue. Pretty good spot.

Except we discarded Deathcult Rogue. So we would have Island and CotN When he attacks with Token we have to chump with token. And it’s a topdeck war with us at 1 life. He has a cluestone he can crack to find an extra card as well. That isn’t great for us.

If he doesn’t cast weapon surge and instead makes a 4/2 first strike we can make another rat. He loses a goblin token and a Viashino Firstblade. We are at 1 life, but have 3 rats. Even better then above.

8e) Ok so, assuming he players correct (and weapon surges), what do we do now that have discarded Deathcult Rogue?

Then the doubt creeps in.

What made me so sure he had drawn weapon surge? Obviously a snap read is based on intuition but if you put a gun to my head how sure would I have been really? 70%? 90%? How likely are we to win the games where he is actually bluffing, and we just call?

Some people would tell you the pressure was overwhelming or they felt the world on their shoulders. But it was nothing so dramatic. My lucky history in Magic has given me wealth of experience on being embarrassed during feature matches.

Instead I gave my team “the speech”.

Lucas: “We fucked up. We are probably 10-20% to win if play around my initial read. We are close to 100% to win if we don’t play around and my read was wrong.

What do you guys want to do?”

Them: YOLO.

Oddly enough this seems to primarily be the refrain of those in the process of committing suicide.

We would be no exception.

Final Thoughts:

They had the weapon surge. We lost.

We played a game for 8 turns (7 on our side) and made at least 3 mistakes, 2 of which may have cost us the match.

We played a turn 2 pack rat and lost.

Because we made the perfect read against one of the best teams in the world, we had a chance to win even when they were drawing pretty well.

Its unfortunate that Magic chose that moment in time to be a skill game.

FIN.

Friday, 5 July 2013

HoF Voting. Quant Style.

I don’t want to get into an argument about the use of intangibles or subjective achievements (penalties as well) for use in HoF voting. This is just going to be a simple explanation and presentation of a mathematical approach to judging deservedness. Its not necessarily how I would vote exactly, but I think its a more a honest method then most.

1. The first cut.

Due to time constraints (aka lazy constraints) I only considered players with 5 top 16s or more. The cut is somewhat arbitrary, but it left me with 25 considerations and it seems like below that number you would have to rely on subjective arguments anyways (e.g. Pikula and Herberholz’s of the world).

Once I did this, I did all analysis WITHOUT names attached, to remove as much bias (during the methodology creation) as possible.

2. The meat of the method.

I created 5 super categories. Each one has multiple components. Then I gave a weight to each of these categories. This creates an overall score for each player and the top 5 scores were reported. I like this methodology since it can tell you where a player has a deficit or strength. If you disagree with my weights it simple enough to see how the rankings change based on your own personal preferences. For example, Lauren Lee doesn’t think consistency should matter whereas my friend Sam seemed to think it much more important.

Below I list the 5 categories, the weight assigned to each, an example of a subcomponent and some discussion of players who excel or fail in the category. Finally I might add some color commentary.

Note some weights changed since I posted on facebook based on discussions with people I respect.

Longevity (10%):

How long was the player at a high level of magic. The simplest subcomponent is # of PTs played.

Top 3 (always in order)
Ikeda, Yasooka, Stark

Bottom 3 (no order unless mentioned)

Krempels, Justice, Soh/Kaji.

This is one the places where Justice gets really punished. If you put little weight on Longevity, I think its hard to argue that he shouldn’t get in.

Consistency (15%):

Was the person consistent at the highest level of play. PT Median Finish is here. Note this is somewhat independent of how long the player played.

Top 3)
LSV, Efro, Osyp/Mori

Bottom 3)

Jurkovic, Tiago Chan, Geoffrey Siron

I am not sure that consistency should be that important. If someone was bad early on in their PT career, but became dominant I could fully imagine they belong in the HoF.

Best in World (25%):

Could we consider the player amongst the best in the world during some extended period of time. I think its hard to argue that someone is among the best of all time if you can’t even provide evidence that they were the best during their time. 3 YR Median is one of the subcomponents.

Top 3)
LSV (Get used to this), Saito, Wafo-Tapa

Bottom 3)
Reitzl (Booooo), Ikeda (first good argument for why he shouldn’t be in), Jurkovic

Place in History (25%): How unusual is their resume? Do they have something that really stands out, makes you say “Wow that would be hard to do”. How many standard deviations above the mean are their stats.

Top 3)
LSV, Saito, Yasooka (15 Gp Top 8s, 16^th on Money List, insanely high pro points)

Bottom 3)

Krempels (no idea who he is for good reason?), Tiago Chan, Justice (0 GP Top 8s, almost no pro points).

Again this tries to quantify place in history, I know lots of people would argue Justice should be higher but I need a data point and I don’t have one.

Skill (25%): This is probably the most controversial category, I think even if you had low skill and had results from the above categories you might deserve to be in. %Top 16s is an example of skill.

Top 3)
Justice, LSV (now I wanna see a Justice/LSV grudge match), Efro

Bottom 3)

Tiago Chan, Ikeda, Fabiano

4^th was a tie between Johns/Kaji.

The Final Top 10 (with scores and lower is better):

LSV (1.65)
Saito (6.4)
Yasooka (7.55)
Efro (7.85)
Osyp (8.1)
Gary (8.1)
Stark (8.45)
Wafo-Tapa (8.85)
Mori (8.9)
Johns (9.05)

What do the top 10 have to do make a move into top 5:

Gary – literally anything to break the tie with Osyp.

Stark – Scores worst in Skill (low Median) and BiW (low 3 Year Median), which I am sure many would disagree with. Honestly just weighting those two areas slightly lower and hes in.

Wafo-Tapa – Consistency and Longevity were is two weakest areas and the ban definitely didn’t help that.

Mori – Skill and BiW need improvement.

Johns – Longevity is far and away his worst score. Hard Time to start PTQing IMO.

Justice – If you put 0 weight on Longevity/Place in History, Justice becomes a slam dunk candidate (2^nd to LSV, with Saito falling to 8^th in this case).