Page 1 of 3 123 LastLast
Results 1 to 20 of 49

Thread: Sad but true: Match-up estimations gone wild

  1. #1
    Member
    GoboLord's Avatar
    Join Date

    Apr 2010
    Location

    Germany
    Posts

    143

    [article] Sad but true: Match-up estimations gone wild

    When you ask 24 Merfolk-players to guess what percentage of games (including sideboarding) Merfolk wins against Belcher-combo you’ll get answers ranging from 40%-95%. Why that? Do they play different decks? Probably not. Don’t they know the match-up? I’d say most of them do. This raises the question of who’s estimates are accurate. If some players have difficulties in guessing the match-ups of “their” decks right, how is it possible for players who did never play Merfolk or Belcher-combo to make such estimates? Still it is common sense that the MU in the given example is in favor of Merfolk and most people seem to know that since the average estimate was 68%. In the following article I will explain that it takes more than simply experience to make good estimates about certain match-ups.
    This article presents the results of an analysis of certain MUs of the most common legacy decks. I will structure the article as follows:

    1. Method (how I collected the data)
    2. Results
    3. Applying the results
    4. Contributions and limitations


    Before you start reading: please note that I do not want to offend anyone. Neither do I claim that this article holds the “absolute truth” about legacy. I’m an ambitioned player - just like most you – who favors statistics when it comes down to estimating match-ups. I want to give the reader a more distant view on some topics that are commonly discussed among magic players.

    1. Method

    In January 2010 I determined 20 decks that “defined legacy and won’t disappear in the curse of the year 2010”. The thought behind that was, that I didn’t want to collect data about decks that will soon disappear. I decided to take the following:

    - Dragon Stompy
    - Goblins
    - Merfolk
    - Zoo
    - Eva Green
    - Aggro Loam
    - The Rock
    - Threshold
    - Team America
    - Survival
    - Counter-Top Bant
    - Dreadstill
    - Enchantress
    - Lands
    - Landstill
    - White Staxx
    - UBx Stormcombo
    - Belcher
    - Dredge
    - Reanimator

    Note that this selection is questionable – I will return to that point in the 4th part of the article.
    With the banning of Mystical Tutor Reanimator seemed to vanish, so I cut it out of my analysis. I added Painted Stone and Mono Black.

    Collecting data

    In the following part I will talk about “games” and “matches”. For me 1 match constist of up to 3 games. Thus, 2-1 is one match with 3 games.
    What I wanted to do is measuring the percentages of games (including sidboarded games) that Deck A wins against Deck B. An example of that would be: Merfolk 40% - Goblins 60%. This means that Goblins wins 6 out of 10 games against Merfolk. I chose this design because it is commonly seen on forums and players express their estimates in this “percentage-fashion”.
    Therefore I needed pairings and their actual outcomes, e.g. Merfolk 1 – Goblins 2 (means that Goblins win this match 2-1 after 3 games inclusing sideboarded games).
    I collected the data through (1) personal observation of other matches I saw on tournaments, (2) recording my own results and (3) analyzing the spreadsheets of 6 Star City Games Open legacy tournaments, published by Jared Sylva. Of course the latter provided me with the largest part of information – thanks for that Jared!

    Calculating results

    After collecting the data I calculated the percentage as shown in the following example:

    Let’s take an easy one: Goblins vs. The Rock
    The recorded results: 2-0; 2-1; 2-0; 2-1 and 2-1
    This makes a total of 5 matches with 13 games
    Goblins wins 10 out of 13 games
    The Rock wins 3 out of 13 games
    Therefore it’s 10/13= 0,83 --> 83%

    The MU is in Goblin’s favor: 83%-17%

    By recording and calculating the results in this particular fashion we know how sideboarding changes the results. Another way to calculate the results would be:

    Goblins win 5 out of 5 matches. Thus it’s 5/5 = 100%
    Result: the MU is in Goblin’s favor: 100%-0%

    This calculation does obviously give a wrong expression of the MU and is therefore useless. That’s why I decided to pick up the first one.

    2. Results

    The results are listed in a cross table. The table holds only percentages of MUs with 10 or more recorded matches (thus at least 20 games) – in order to make the results more significant.

    For everyone who does not want to download the table, here are the results that I found most striking:

    Merfolk: 45% - Goblins: 55%
    2-0 XXXXX.XX
    2-1 XXXXX.XX
    1-1
    1-2 XXXXX.XX
    0-2 XXXXX.XXXXX.XXX
    This means that Merfolks win 45% of the games (including sideboarded games) against Goblins, while Goblins win most of the matches 2-0. I will report the following example in the same fashion only without explanations.

    Lands: 40% - Goblins: 60%
    2-0 XX
    2-1 XX
    1-1 XX
    1-2 XX
    0-2 XXXXX.


    UBx Stormcombo: 49% - Goblins: 51%
    2-0 XXXXX.
    2-1 XXX
    1-1
    1-2 XXXXX.XXXXX.
    0-2 XX

    UBx Stormcombo: 29% - Merfolk: 71%
    2-0 XXX
    2-1 XXXX
    1-1
    1-2 XXXXX.XXXXX.X
    0-2 XXXXX.XXXXX.XXXXX.XXX

    Belcher: 38% - Merfolk: 62%
    2-0 XXXX
    2-1 XX
    1-1 X
    1-2 XXXXX.XXXXX.XXX
    0-2 XXXXX.XXX

    Lands: 49% - Zoo: 51%
    2-0 XXXXX.
    2-1 XXXX
    1-1 XXXXX.XX
    1-2 XXXXX.XXX
    0-2 XXXX

    Belcher: 60% - Zoo: 40%
    2-0 XXXXX.XXXXX.X
    2-1 XXXXX.XXXX
    1-1 X
    1-2 XXXXX.XXX
    0-2 XXX

    Mono Black: 83% - Zoo: 17%
    2-0 XXXXX.XX
    2-1 XX
    1-1
    1-2 X
    0-2

    UBx Stormcombo: 43% - CT Bant: 57%
    2-0 XXXX
    2-1 XX
    1-1
    1-2 XXXXX.X
    0-2 XXXXX.

    Belcher: 36% - TES 64%
    2-0 X
    2-1 XX
    1-1
    1-2 XXX
    0-2 XXXX

    Here is what I found striking:

    - Goblins and Merfolk seem to be even (45%-55%)
    - Goblins are better against lands than against Merfolk (60% vs. 55%)
    - Goblins do rather good against UBx Stormcombo (51%-49%)
    - Merfolk does worse against Belcher than against TES (62% vs. 71%)
    - Lands vs. Zoo ends up 1-1 very often in comparison to others (7x!)
    - Zoo does worse against Mono Black than against Belcher (17% vs. 40%)
    - UBx Stormcombo does surprisingly good against CT Bant (43%)
    - Belcher does bad against UBx Stormcombo

    Some additional information (not listed here, look at table)
    The best MU (among the ones I reported) for

    - …Goblins is CT Bant (63%)
    - …Merfolk is UBx Stormcombo (71%)
    - …Zoo is Enchantress (70%)
    - …Eva Green is Merfolk (52%)
    - …Aggro Loam is Zoo (63%)
    - …CT Bant is UBx Stormcombo (57%)
    - …Lands is Dredge (66%)
    - … UBx Stormcombo is Lands (68%)
    - …Belcher is Goblins (78%)
    - …Dredge is CTBant (58%)

    The worst MU (among the one I reported) for

    - …Goblins is Belcher (22%)
    - …Merfolk is Enchantress (33%)
    - … Zoo is Mono Black (17%)
    - … Eva Green is Zoo (37%)
    - …Aggro Loam is UBx Stormcombo (42%)
    - …CT Bant is Goblins (37%)
    - …Lands is UBx Stormcombo (32%)
    - … UBx Stormcombo is Merfolk (29%)
    - …Belcher is UBx Stormcombo (36%)
    - …Dredge is Lands (34%)

    3. Applying the results

    Now that we have this amount of information - what comes next? Of course we might want to learn something from them and maybe even apply them to deck construction, sideboard construction, tournaments preparation, playtesting etc.
    Note that this might again be stuff for criticism and again I want to point to the 4th part of this article: Limitations.
    Before we start I like to introduce a friend of mine: Mr. Average. Mr. Average has played every MU recorded in my table. Therefore the statistics we are discussing are his personal statistics. Mr. Average is the average player with the average decklist, average playing style and average skill. He will be our guide in the following part.
    Let’s look how to apply the results. Since Goblins is the deck I can pilot best, I will give an easy example from Goblins’ perspective that might work for other decks too.
    A common opinion (when it comes to sideboard construction) is to not run any combo hate. We like to reason that “the MUs is just too bad and I just make other MU better by putting other cards in my SB”. With a short glimpse to the results the Goblin player notices that at least part of our combo MU, namely TES, is actually not as bad as expected. We now take a look at the table below: It tells us that Goblins win most of the matches 2-1. Since Goblins are likely to lose g1 it seems that percentages rise after sideboarding. A plausible conclusion would be that Goblins have rather effective cards to fight combo in g2 and g3 and that this MU is far away from being “just too bad”. As a result Goblins should consider dedicating some slots in SB to combo hate. Similar thoughts can be applied to sideboard (and MD) construction of other decks.
    I am aware of the fact that MU-philosophy is not only about what cards you have in SB. As someone in this forum put it: “It’s not the decks that have good or bad MUs, it’s players that have MUs.”. Or as someone suggested in a survey of mine: “Player can change a big deal of the MUs.”. IMO both are right. sideboarding and deckconstruction have to fit someone’s skill and playing-style. When applying the results to ourselves we should note that they hold information about Mr. Average. To give another example:
    The categories of decks I listed are very vast. Goblins can splash the colors W, G and B. Those splashes push the MUs in certain directions. Someone who doesn’t run G has no access to Krosan Grip and is therefore more vulnerable against Moat/Humility.
    To determine your distance to Mr. Average" it would be useful to report the outcomes of the games you had on tournaments and while playtesting. This might help to determine what your MUs (not that of your deck) are. E.g. 4 months ago I found it quite hard to beat Merfolk with Goblins. My personal statistics told me that I was far away from winning 55% of the matches (like Mr. Average). So there was obviously a distance between him and me – my MU against Merfolk was bad, not that of my deck.
    The results tell us what Mr. Average is and are therefore by no means applyable to your particular decklist, playingstyle and skill. They are worthless without some thoughts of interpretation. Or as a user in this forum puts it:
    Quote Originally Posted by Mostly_Harmless View Post
    I suspect the real problem is that "Mr. Average" is pretty good with some decks (like goblins), but not all that good with others (like storm combo). (As anecdotal evidence, my TES vs. Goblins matchup improved dramatically after goldfishing a thousand or so times. It takes a lot of practice to make sure you don't lose to yourself, let alone hate. That's not nearly as much of an issue with goblins, where it's pretty straightforward to win if your opponent screws themself.) I suppose that's not necessarily a problem, though, as long as you understand that you're measuring matchups between all players, rather than particularly skilled players with a given deck.
    And that is what we need to do when we want to apply them. Let me tell you something from a psychologists point of view at the end of this part: people tend to overestimate themselves on field they value much. This effect is known as the “above-average-effect”. To proof this effect, researchers interviewed male drivers in hospital after they had a car accident. 80% of those participants (all of them male, still injured though recovering from the crash!) rated themselves as better as the average car driver.


    4. Contributions and limitations

    As I said right at the beginning: This article does not hold absolute truth about legacy. Neither do I claim that all of my methods are 100% perfect. In this last part I will discuss pros and cons of my analysis.

    What does the article contribute?

    This article is about averages - nothing more and nothing less. It tells us what the MU of certain decks are like according to average decklists, playingstyles and skills. This article reveals some rather surprising results that are far away from actual estimations that players give. This might help to reduce wrong impressions and ratings when discussing strategies against certain decks. It is designed to push thoughts about playtesting, tournaments, deck- and sideboard construction away from subjective opinions and to give a more distant view on MUs in general.

    What are limitations of the article?

    As I said throughout the article, there are two questionable points that I want to discuss.
    First of all: the categorization of decks. I doubt that people will be happy with such vast categories “Survival”, “Landstill” and “UBx Stormcombo”. While the concept of Belcher is very clear cut the decks named before can be very different in decklist. Nevertheless are there reasons for this categorization. It is quite hard to find enough data about decks like Survival when I was to split them up into their subtypes – still I wanted to tell something about them. I grouped them together because their function and win condition is very similar: A green-based deck that is able to create card advantage and to find flexible answers via Survival of the Fittest. The same is true for Landstill and UBx Stormcombo. All Landstill decks function in that way that they create card advantage via Standstill, have a slow win condition and are blue and control-based. All TES/ANT/DDANT are combo decks that share many mana producing cards, the storm mechanic, tutor and cantrips and are lethal in virtually turn 2-3. I know that experts on those deck will disagree with me, but this categorization does not mean that those decks are the same – the categorization is therefore functional. Experts on those decks will know how to interpret the results for their particular deck: e.g. Survival Bant is better off against combo than Survival feat. Recurring Nightmare. Once again: Results tell nothing without some thought of interpretation.
    Second, the application of the results. You might say that the results are not very helpful because they contain data of the most lousy losers as well as those of tournament winners and that they therefore can’t be applied to anyone. When looking at the Merfolk (45%) – Goblin (55%) MU you might come up with explanations as “you just recorded too many lousy Goblin players, this MU must be better”. Maybe you are right, maybe you aren’t. There is always some randomness in statistics. The more games I record the less likely it is that the result is touched by randomness. To ensure at least a little significance I only reported MU from which I have 10 or more records (thus at least 20 matches). One should also take note of what I wrote at the end of part 3: Results tell nothing without some thought of interpretation.






    Thank you for reading my article.
    I would like to hear any helpful comment.
    If you have any questions please feel free to ask, I will try to answer them.
    Last edited by GoboLord; 09-11-2010 at 04:14 PM.

  2. #2
    They call me a slob, but I do my job...
    Cthuloo's Avatar
    Join Date

    Sep 2009
    Location

    Back to the city by the sea, blowin' in the wind, fighting with hordes of retired people
    Posts

    274

    Re: [article] Sad but true: Match-up estimations gone wild

    Great work. There's definitely a lack of hard matchups data, and everyone who tries to fill this gap is of great help in making us all better understanding the meta. There's a point I would like to discuss, though.

    [WARNING : A bit of math will follow]

    You mentioned in your article Mr. Average, and all the caveat needed to deal with him. Actually, to be sure we understood what Mr. Average is telling us, we can get the help of Mr. Variance. Let's make a real example. You collected (if I counted correctly) a total of 185 Zoo vs. Merfolk games. It turns out Zoo won 111 of them, while Merfolk won 74. Mr. Average tells us that the matchup is 60-40. Let's do the simplest treatment of this statistic. In principle we can approximate the probability with a Bernoullian one calling

    P = Probability that Zoo wins a game
    Q = Probability that Merfolk wins a game

    with P+Q = 1, P = 1-Q.

    Then, the data for a number N of game should follow a binomial distribution centered on N*P. This distribution has a variance of the form

    V = NPQ

    In you case, the center of the distribution is 111, and the variance is

    V = 185*0.6*0.4 = 44.4

    Since the number of events is decently large, we can then approximate the distribution with a gaussian with a standard deviation S = Sqrt(V) = 6.66. Then we can say something about the "true" value of the matchup. Calling M the number of game won by zoo we see that:

    104 < M < 117 with 68% probability
    97 < M < 125 with 95% probability
    91 < M < 131 with 99% prbability

    We can then say with a confidence of 95% that the matchup is in zoo's favour (i.e. M/N = 97/185 > 0.5).

    This also means that we still need a lot more statistic to express precise conclusions based only on data. This is particularly true in the case of Storm Combo vs Goblins, when you have a total of 53 events, and the variance is roughly

    V = 53*0.5*0.5 = 13.25

    Then

    S = 3.64

    Looking at the number of games won by Goblins (which by the measured data is 27/53), called G:

    23 < G < 31 with 68% probability
    19 < G < 35 with 95% probability
    16 < G < 38 with 99% prbability

    So we can't say with decent certainty that the matchup is not 30 - 23 instead of 26 - 27, and there is more than 5% probability that it is instead something like 34 -19 ( which means 65% in favour of the storm combo player ).

    I'm not sure of you followed me, but basically the message is: we need more data, and to be careful to interpret the ones we have. Let me remark again how I appreciated your work, and that this criticism is meant to be a constructive one and by no me is directed at how you conducted your research.
    Team Stimato Ezio: You're off the team!

    People demand freedom of speech as a compensation for the freedom of thought which they seldom use.
    -Kierkegaard

  3. #3
    Member
    GoboLord's Avatar
    Join Date

    Apr 2010
    Location

    Germany
    Posts

    143

    Re: Sad but true: Match-up estimations gone wild

    On university I'm a bit into statistics myself, therefore I understand the essence of what you wrote (and calculated). I know that, although it took 9 month to gather the data, it takes much more to makes the results significant with p < .05
    Plus there are many MU for which I dont even have a single record - so there is much mor work waiting... Unfortunately Legacy remains a ever-changing format, so nobody knows what decks we hav next year. I didn't even dream of Reanimator's drop-out when i started the analysis.

    Thanks for your comment.

  4. #4
    ..sry, whut? ◔̯◔
    Humphrey's Avatar
    Join Date

    Jan 2008
    Location

    Germany
    Posts

    730

    Re: Sad but true: Match-up estimations gone wild

    Awesome work!

    Yeah it would be nce to get more data, maybe every event with Players >100 in one year

    What id like to know whick deck(s) have the best average matchup against the field and which are the most played.

    And I dont understand what the XXX after 2:0 means
    Got tired of Legacy and you like drafts? Try my Paupercube What?

  5. #5
    Member
    GoboLord's Avatar
    Join Date

    Apr 2010
    Location

    Germany
    Posts

    143

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by Humphrey View Post
    Awesome work!

    Yeah it would be nce to get more data, maybe every event with Players >100 in one year

    What id like to know whick deck(s) have the best average matchup against the field and which are the most played.

    And I dont understand what the XXX after 2:0 means
    It's not useful to draw such a conclusion because the field does not contain rouge-decks and other decks that I simply missed like: Faeries, New Horizont, Sneak Attack etc.

    Each X stands for one game that ended 2-0 or 2-1 (depending where it stands).
    Therefore

    Deck A - Deck B

    2-0 XXX
    2-1 X
    1-1
    1-2 XXXX
    0-2 XXXXX.XX

    Deck A wins 3x 2-0, 1x 2-1
    Deck B wins 4x 2-1, 7x 2-0

  6. #6
    keepin' it unreal
    caiomarcos's Avatar
    Join Date

    May 2007
    Location

    Gothenburg, Sweden
    Posts

    407

    Re: Sad but true: Match-up estimations gone wild

    I can't download the table, it says: "The file you are trying to access is temporarily unavailable."
    "Want all, lose all."

  7. #7
    Trample, Haste
    pippo84's Avatar
    Join Date

    Mar 2009
    Location

    Italy
    Posts

    467

    Re: Sad but true: Match-up estimations gone wild

    @ GoboLord: very interesting info! I will read the article again when I'll have more time. Thus I'll look at the info more in depth. Good job anyways! I'll probably post some more comments later on. I'll also like to see more MU analysis.

    @ CThuloo: I didn't understand a thing! Prepare yourself to explain me something next time..
    Team Stimato

    Quote Originally Posted by Julian23 View Post
    He told you a foil from Time Spiral was Summer?
    This man must be a Jedi.

  8. #8
    They call me a slob, but I do my job...
    Cthuloo's Avatar
    Join Date

    Sep 2009
    Location

    Back to the city by the sea, blowin' in the wind, fighting with hordes of retired people
    Posts

    274

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by GoboLord View Post
    On university I'm a bit into statistics myself, therefore I understand the essence of what you wrote (and calculated). I know that, although it took 9 month to gather the data, it takes much more to makes the results significant with p < .05
    Plus there are many MU for which I dont even have a single record - so there is much mor work waiting... Unfortunately Legacy remains a ever-changing format, so nobody knows what decks we hav next year. I didn't even dream of Reanimator's drop-out when i started the analysis.

    Thanks for your comment.

    Yes, the one you're doing is definitely a hard job! ;) But it's indeed very precious. Even when the data are not conclusive, they can be still of great help.

    E.G., even if we can't really say the Storm vs Goblin matchup really is even, the data suggest that it is definitely not impossible, and you made correct and interesting deductions about Gobbo's sideboard, I will be definitely interested in seeing upgrades on the table in the next months! ;)
    Team Stimato Ezio: You're off the team!

    People demand freedom of speech as a compensation for the freedom of thought which they seldom use.
    -Kierkegaard

  9. #9
    Member
    GoboLord's Avatar
    Join Date

    Apr 2010
    Location

    Germany
    Posts

    143

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by Cthuloo View Post
    Yes, the one you're doing is definitely a hard job! ;) But it's indeed very precious. Even when the data are not conclusive, they can be still of great help.

    E.G., even if we can't really say the Storm vs Goblin matchup really is even, the data suggest that it is definitely not impossible, and you made correct and interesting deductions about Gobbo's sideboard, I will be definitely interested in seeing upgrades on the table in the next months! ;)
    It would be helpful if you'd send me spreadsheets (like the ones posted on starcitygames by Jared Sylva) when you find any.

  10. #10
    Member
    klaus's Avatar
    Join Date

    Oct 2007
    Location

    Berlin, Germany
    Posts

    1,203

    Re: Sad but true: Match-up estimations gone wild

    GoboLord, I appreciate your effort.
    But as you concluded, what you initiated would have to evolve into a Source collaboration boasting ten times as much data to become meaningful. Goblins having a positive Storm Combo MU in your analysis emphasizes that in bold letters.

  11. #11
    Member
    GoboLord's Avatar
    Join Date

    Apr 2010
    Location

    Germany
    Posts

    143

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by klaus View Post
    GoboLord, I appreciate your effort.
    But as you concluded, what you initiated would have to evolve into a Source collaboration boasting ten times as much data to become meaningful. Goblins having a positive Storm Combo MU in your analysis emphasizes that in bold letters.
    Well actually it's a statistical law that results don't change much if you have a sample with N > 30.

    This means that if I recorded 30 games Merfolk vs. Zoo and the outcome is e.g. 40% - 60% it won't be 60% - 40% after 300 recorded games.
    In the particularcase of UBx Storm combo vs. Goblins I recorded 53 games. Thus the MU percentage won't change dramatically with 530 games, it will stay around 50%-50% +/- 5% maybe.

    My N is at least 20 for every MU, so it doesn't take many more data to make those numbers significant that I already reported. I rather need more data about the MUs I didn't report yet.

  12. #12
    Member

    Join Date

    Jan 2009
    Location

    Tempe, AZ
    Posts

    31

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by GoboLord View Post
    Well actually it's a statistical law that results don't change much if you have a sample with N > 30.
    That's a little ridiculous. For one thing, "statistical law" doesn't mean your answer will never change drastically, it means your answer will probably not change drastically (for some value of probably). It's entirely possible you accidentally picked 40 very unlucky matches. That's not actually what I think happened, though. I suspect the real problem is that "Mr. Average" is pretty good with some decks (like goblins), but not all that good with others (like storm combo). (As anecdotal evidence, my TES vs. Goblins matchup improved dramatically after goldfishing a thousand or so times. It takes a lot of practice to make sure you don't lose to yourself, let alone hate. That's not nearly as much of an issue with goblins, where it's pretty straightforward to win if your opponent screws themself.) I suppose that's not necessarily a problem, though, as long as you understand that you're measuring matchups between all players, rather than particularly skilled players with a given deck.

    I'm curious to see what happens if you look at win percentages for particular players in a given matchup. I suspect you'd find several storm players with 80-20 or 90-10 records vs. goblins and a lot of players with 40-60 records.

    That said, I really do appreciate the work you put into this. It's nice to see someone actually look at data rather than just make educated guesses about matchups (like I just did =).)

  13. #13
    They call me a slob, but I do my job...
    Cthuloo's Avatar
    Join Date

    Sep 2009
    Location

    Back to the city by the sea, blowin' in the wind, fighting with hordes of retired people
    Posts

    274

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by Mostly_Harmless View Post
    That's a little ridiculous. For one thing, "statistical law" doesn't mean your answer will never change drastically, it means your answer will probably not change drastically (for some value of probably). It's entirely possible you accidentally picked 40 very unlucky matches. That's not actually what I think happened, though. I suspect the real problem is that "Mr. Average" is pretty good with some decks (like goblins), but not all that good with others (like storm combo). (As anecdotal evidence, my TES vs. Goblins matchup improved dramatically after goldfishing a thousand or so times. It takes a lot of practice to make sure you don't lose to yourself, let alone hate. That's not nearly as much of an issue with goblins, where it's pretty straightforward to win if your opponent screws themself.) I suppose that's not necessarily a problem, though, as long as you understand that you're measuring matchups between all players, rather than particularly skilled players with a given deck.

    I'm curious to see what happens if you look at win percentages for particular players in a given matchup. I suspect you'd find several storm players with 80-20 or 90-10 records vs. goblins and a lot of players with 40-60 records.
    The interesting thing is that, if we really had a huge amount of data, one could in principle take account also for player's skill. The binomial distribution will have a peak around the average and then decrease with a precise law from both sides (it should look like a bell, for a high number of data). Then if you e.g. suppose to be in the top 10% of storm combo players, you can localize at which point of the curve you are, and find the expected matchup average for your skill (supposing the skill distribution of combo player is a gaussian and not something completely weird for some reason). But this will probably require an amount of data we will never have,

    Well actually it's a statistical law that results don't change much if you have a sample with N > 30.

    This means that if I recorded 30 games Merfolk vs. Zoo and the outcome is e.g. 40% - 60% it won't be 60% - 40% after 300 recorded games.
    In the particularcase of UBx Storm combo vs. Goblins I recorded 53 games. Thus the MU percentage won't change dramatically with 530 games, it will stay around 50%-50% +/- 5% maybe.

    My N is at least 20 for every MU, so it doesn't take many more data to make those numbers significant that I already reported. I rather need more data about the MUs I didn't report yet.
    I agree to some extent. 53 is not a really huge number, but it still tells us that it is highly difficoult that the matchup is better than 65-35 for combo (with 95% certainty), which IMHO is something already worth to know, since the usual knowledge is that the matchup should be more like 80-20.
    Team Stimato Ezio: You're off the team!

    People demand freedom of speech as a compensation for the freedom of thought which they seldom use.
    -Kierkegaard

  14. #14
    Member

    Join Date

    Jul 2010
    Location

    Columbus, OH
    Posts

    48

    Re: Sad but true: Match-up estimations gone wild

    Great research, thanks!

  15. #15
    Member
    GoboLord's Avatar
    Join Date

    Apr 2010
    Location

    Germany
    Posts

    143

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by Mostly_Harmless View Post
    That's a little ridiculous. For one thing, "statistical law" doesn't mean your answer will never change drastically, it means your answer will probably not change drastically (for some value of probably). It's entirely possible you accidentally picked 40 very unlucky matches. That's not actually what I think happened, though. I suspect the real problem is that "Mr. Average" is pretty good with some decks (like goblins), but not all that good with others (like storm combo). (As anecdotal evidence, my TES vs. Goblins matchup improved dramatically after goldfishing a thousand or so times. It takes a lot of practice to make sure you don't lose to yourself, let alone hate. That's not nearly as much of an issue with goblins, where it's pretty straightforward to win if your opponent screws themself.) I suppose that's not necessarily a problem, though, as long as you understand that you're measuring matchups between all players, rather than particularly skilled players with a given deck.

    I'm curious to see what happens if you look at win percentages for particular players in a given matchup. I suspect you'd find several storm players with 80-20 or 90-10 records vs. goblins and a lot of players with 40-60 records.

    That said, I really do appreciate the work you put into this. It's nice to see someone actually look at data rather than just make educated guesses about matchups (like I just did =).)
    What you say is absolutely true. That's why I refered to "Mr. Average". If most storm combo players do lousy against Goblins (40-60) and a few are very good (90-10) then it's just not true that this MU is in favor of storm combo, because in most cases it isn't.
    Note that this exactly what I said at the last passage of part 3. I agree with you that if both players are skilled with their decks combo should win with a greater probability. I repeatedly wrote that I don't compare players but averages. Still, if we take more data it will just contain the same amount of bad and good players. Therefore the results probably (95%) won't change much.

    I added your post to my article, because it makes clear what I meant.

  16. #16
    Member

    Join Date

    Jan 2009
    Location

    Tempe, AZ
    Posts

    31

    Re: Sad but true: Match-up estimations gone wild

    Well I guess we actually agreed, then. That'll teach me to read more carefully.

    @Cthuloo: I don't see any particular reason for the distribution of play skill to be normal. If anything, I'd expect a bimodal distribution for any sufficiently difficult deck (be it combo or countertop). The people who pick up a deck and play it for a tournament or two will be pretty lousy (I do this a lot with various tempo decks), while the people who pick a deck and stick with it get pretty good pretty fast. We don't get to use the Central Limit Theorem here (not that you seem to think we can; I just wanted to be clear) because each data point is the outcome of a single game/match, not the average of many.

  17. #17
    They call me a slob, but I do my job...
    Cthuloo's Avatar
    Join Date

    Sep 2009
    Location

    Back to the city by the sea, blowin' in the wind, fighting with hordes of retired people
    Posts

    274

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by Mostly_Harmless View Post
    Well I guess we actually agreed, then. That'll teach me to read more carefully.

    @Cthuloo: I don't see any particular reason for the distribution of play skill to be normal. If anything, I'd expect a bimodal distribution for any sufficiently difficult deck (be it combo or countertop). The people who pick up a deck and play it for a tournament or two will be pretty lousy (I do this a lot with various tempo decks), while the people who pick a deck and stick with it get pretty good pretty fast. We don't get to use the Central Limit Theorem here (not that you seem to think we can; I just wanted to be clear) because each data point is the outcome of a single game/match, not the average of many.

    You make a good point. What is really hard to model, however, is the eventual shape of this bimodal distribution, were the peaks are and how high is the "good players" peak with respect to the other. There could even be more peaks, and in the presence of many different peaks the final distribution may very well look like a gaussian - but it's hard to tell. I find it difficoult to imagine that we will ever have the data to reconstruct the shape of the distribution, so here's were personal experience plays a big role filling the gaps.
    Team Stimato Ezio: You're off the team!

    People demand freedom of speech as a compensation for the freedom of thought which they seldom use.
    -Kierkegaard

  18. #18
    Member
    GoboLord's Avatar
    Join Date

    Apr 2010
    Location

    Germany
    Posts

    143

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by Cthuloo View Post
    You make a good point. What is really hard to model, however, is the eventual shape of this bimodal distribution, were the peaks are and how high is the "good players" peak with respect to the other. There could even be more peaks, and in the presence of many different peaks the final distribution may very well look like a gaussian - but it's hard to tell. I find it difficoult to imagine that we will ever have the data to reconstruct the shape of the distribution, so here's were personal experience plays a big role filling the gaps.
    You are right with what you say, but finding distributions of playskills is

    a) not possible cause skill must be measured on more than succes with a particular deck
    b) not what I wanted to show with this analysis.

    I now like to hear some advice before I go on collecting data.
    Maybe we could discuss more application of the data? So far we have only found out what it can not be applied to and what the limitations are.

  19. #19
    They call me a slob, but I do my job...
    Cthuloo's Avatar
    Join Date

    Sep 2009
    Location

    Back to the city by the sea, blowin' in the wind, fighting with hordes of retired people
    Posts

    274

    Re: Sad but true: Match-up estimations gone wild

    Quote Originally Posted by GoboLord View Post
    You are right with what you say, but finding distributions of playskills is

    a) not possible cause skill must be measured on more than succes with a particular deck
    b) not what I wanted to show with this analysis.
    You're definitely right, I was only answering to Mostly_Harmless remark.

    Quote Originally Posted by GoboLord View Post
    I now like to hear some advice before I go on collecting data.
    Maybe we could discuss more application of the data? So far we have only found out what it can not be applied to and what the limitations are.
    Just throwing in some ideas, probably not all of them very good:

    - Agglomerate decks in Archetypes (the usual Aggro-Control-Combo for instance), and then try to see how they perform against each other (is really Combo>Aggro>Control>Combo?). Then one could repeat the process for single deck vs archetype (is Mono Black a good choice in a field full of Aggro?). We don't need to be very precise in the classification, I guess, since the huge amount of agglomerated data should be sufficient to make some considerations even if we are not very precise.

    - Try to have a look at the distribution: Win(games) vs. Win(matches). In principle, if you have a probability X to win a single game, you will win a match with a probability Y=X^2+2*(1-X)*X^2 = 3*X^2 - 2*X^3 = X^2*(3-2*X). If the value is very different for a matchup, this should be an index that sideboarding plays a big role.

    - Try to see the % of mathces that end up as a draw for a given deck. This parameter can play a big role when deciding to bring the deck to a big tournament or not.

    This is it what I can think of for the moment.
    Team Stimato Ezio: You're off the team!

    People demand freedom of speech as a compensation for the freedom of thought which they seldom use.
    -Kierkegaard

  20. #20

    Re: Sad but true: Match-up estimations gone wild

    First: kudos for your comprehensive work on this. But I actually question the usefulness. The difference of the individual builds of the certain decks is just too high to compare them in a reasonable way. Just an example: Rhoner Merfolks vs Saito Merfolks and their matchup against a CB Top Bant list Excalibur-Style vs a CB Top Bant List with NO. Most of the deck types you compare have a huge difference in possible competitive builds, only few of them are nearly always the same. In my eyes, there is no possibility to eliminate this problem. Concerning surveys, you could use reference builds. But data from tournaments is just not usable, as the variance in the builds is to high.

    Then, as already mentioned, your work includes some issues concerning your statistical method. Unfortunately, I can’t do more than complaining in this point because I just have poor statistic skills.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)