All B/R update speculation.

**Zoid** · 11-23-2021 03:24 PM

Originally Posted by Reeplcheep

Here is the data from the legacy data project. Hard non-mirror winrates collected by hand. This was a ton of work so if you like stuff like this please consider helping out with data collection or the Patreon.

TBH I don't really care about MTG anymore to even bother.
The fact that it needs the community to collect and process the data instead of WotC is not helping.
As physicist I just (should) know more about statistics and it bugs me when people use them wrong.

Back on topic, it's striking that even though monke decks are the most played and the meta is tailored for that at least UR is still at >=50% WR.

**FourDogsinaHorseSuit** · 11-23-2021 07:52 PM

Originally Posted by Zoid

As physicist I just (should) know more about statistics and it bugs me when people use them wrong.

Originally Posted by Zoid

Assuming 50% win rate seems a stretch.

**Zoid** · 11-23-2021 08:46 PM

?

I still think that is an bad assumption.
There's no reason to assume that any matchup besides the mirror is 50%.
Why even assume anything in the first place?
You just present the matchup data as it is, done.

**Reeplcheep** · 11-24-2021 10:21 AM

Originally Posted by Zoid

?

I still think that is an bad assumption.
There's no reason to assume that any matchup besides the mirror is 50%.
Why even assume anything in the first place?
You just present the matchup data as it is, done.

If you are a really a physicist you should know the definition of null. You always assume no difference and try to prove otherwise. From thermodynamics you should also now the principle of informational entropy; the null has to be the same as assigning labels at random. Otherwise you would conclude snow basics d&t having a 25-24 head to head winrate vs non-snow D&T is telling you something.

H_0 is in the vast majority of case p=1/# of options, (in this case 2, W or L)

**FourDogsinaHorseSuit** · 11-24-2021 10:45 AM

Originally Posted by Zoid

?

I still think that is an bad assumption.
There's no reason to assume that any matchup besides the mirror is 50%.
Why even assume anything in the first place?
You just present the matchup data as it is, done.

It's basic statistics that you test your results against flipping a coin.
A coin will predict the correct outcome 50% of the time, and if you say that a MU is 75/25 you're claiming you can predict it 75% of the time.

**dte** · 11-24-2021 11:14 AM

Fully agreeing with Zoid here.
You present the data, transformation of said data is useful only if it is more informative.

Here I do not see it:
600-400 is more meaningful than 60-40, itself better than 6-4.
I do not see what statistical treatment you could do that would make it more easily understandable by the audience, magic players that do understand what a MU is, both in term of result and reliability of said result.

**FourDogsinaHorseSuit** · 11-24-2021 11:49 AM

Originally Posted by dte

Fully agreeing with Zoid here.
You present the data, transformation of said data is useful only if it is more informative.

Here I do not see it:
600-400 is more meaningful than 60-40, itself better than 6-4.
I do not see what statistical treatment you could do that would make it more easily understandable by the audience, magic players that do understand what a MU is, both in term of result and reliability of said result.

If only there was a way to see if the data you collected was as good as flipping coins.
Oh well, I'm sure it's the field of statistics that's wrong here.

**dte** · 11-24-2021 12:27 PM

Originally Posted by FourDogsinaHorseSuit

If only there was a way to see if the data you collected was as good as flipping coins.
Oh well, I'm sure it's the field of statistics that's wrong here.

The field of statistics cannot answer that. There are formula to tell you whether from some data, you can fix a given range of probability with a given confidence.

So here you could say that there is >95% probability that deck 1 has a win rate comprised between 55 and 65% over deck 2.

It would still be reduced data, ie less information than the actual numbers, eg 600-400 (numbers not corresponding to above statement).

It is very useful to do statistical treatment, but only if it gives you a faster, better understanding. I do not think that it is the case here.

**FourDogsinaHorseSuit** · 11-24-2021 01:07 PM

Originally Posted by dte

The field of statistics cannot answer that.

It's literally the definition of null hypothesis testing.

**Reeplcheep** · 11-24-2021 01:30 PM

Originally Posted by dte

So here you could say that there is >95% probability that deck 1 has a win rate comprised between 55 and 65% over deck 2.

You can’t say that. The confidence interval is for future reproductions not the current event. You can only say that a random fair sample (in this case flipping a coin) would have produced this result <5% of the time, so you can reject the null that both sides are the same.

**FTW** · 11-24-2021 01:47 PM

Originally Posted by dte

Here I do not see it:
600-400 is more meaningful than 60-40, itself better than 6-4.
I do not see what statistical treatment you could do that would make it more easily understandable by the audience, magic players that do understand what a MU is, both in term of result and reliability of said result.

600-400 is obviously more meaningful than 6-4. But let's look at less extreme cases.

If HomeBrew.dec goes 6-4 vs Delver, does that mean your homebrew is favored against Delver or was that just a lucky streak? Maybe the matchup is about even? Maybe it's unfavored? (Those are the most common matchup classifications players use)

Maybe players would intuitively know that's too few games and they need to test more (although some take a single League 5-0 as proof a deck is good, so you never know). But what if that result was 12-8? 18-12? 24-16? 60-40? That's more than 6-4, but is it enough? At what point is it enough games to be reasonably sure HomeBrew is favored against Delver? That's not easy to intuitively know from looking at the raw results. And that's where a statistical treatment adds value. If you do a 1-tailed test with null 50%, it basically tells you whether you had enough games to conclude the matchup is favorable (technically you're rejecting that the matchup is even or unfavorable, but close enough).

It shouldn't come at the cost of presenting the real data. Sometimes people report only a p-value without presenting any of the actual data, but that isn't the only way to present it. You can show both.

2 Examples:
1) 60%*
(N=60)

2) 36-24*

That's clean and simple and still has no information loss from the original data. Both contain enough to tell you that 60 matches were played, 60% were wins, 40% were losses, overall result of 36-24, AND that the matchup was favorable at some standard level of statistical significance you can mention outside the table (e.g. alpha=5%, alpha=10%). The statistical treatment adds value to the result. It tells a player that was "enough" data to classify that as favorable, while 6-4 isn't enough.

Or you could color-code the cells
Green = Favorable (statistically significant at X% confidence)
Yellow = About even (not statistically different from 50-50 at X% confidence)
Orange = Unfavorable (statistically significant at X% confidence)

That should be easy to digest and does tell you more than just the matchup data without any further treatment.

**dte** · 11-24-2021 01:58 PM

Originally Posted by Reeplcheep

You can’t say that. The confidence interval is for future reproductions not the current event. You can only say that a random fair sample (in this case flipping a coin) would have produced this result <5% of the time, so you can reject the null that both sides are the same.

I wrote "has", not "had"?

But my question stands : how is decreasing the information, and giving for a given MU a range + confidence, giving a clearer picture than the actual data, i.e. Win-loss ?

Edit: FTW answered meanwhile. In the example above, I see option 2) as an easier readout than option 1). I still do think that W-L is a better representation, cleaner and simpler, than adding some arbitrarily chosen confidence interval. It is just discretizing the confidence, rather than keeping a continuum.

**FourDogsinaHorseSuit** · 11-24-2021 02:03 PM

Originally Posted by dte

I wrote "has", not "had"?

But my question stands : how is decreasing the information, and giving for a given MU a range + confidence, giving a clearer picture than the actual data, i.e. Win-loss ?

Because you can't record all data so range + confidence also performs validation on how good the data even is. It also provides insight on the greater population while the recorded results only provide information about the sample they are a part of.
This is what statistics is. It's about understanding the greater population of matches given a sample.

**dte** · 11-24-2021 02:15 PM

Originally Posted by FourDogsinaHorseSuit

Because you can't record all data so range + confidence also performs validation on how good the data even is. It also provides insight on the greater population while the recorded results only provide information about the sample they are a part of.
This is what statistics is. It's about understanding the greater population of matches given a sample.

Recording all data or not has no influence here.
You have recorded data, in the form of W-L.
You do not get more or better data by performing whatever treatment you want on it, you are only modifying the representation of said data.
That you would settle on a given probability threshold, likely 90% or 95% is simply discretizing the confidence, which is dependent of the sample size, which you see from W-L in a pseudo continuous fashion.

**FTW** · 11-24-2021 02:22 PM

Originally Posted by dte

Edit: FTW answered meanwhile. In the example above, I see option 2) as an easier readout than option 1). I still do think that W-L is a better representation, cleaner and simpler

I presented both because I think players may have different opinions on this when the numbers get messier. For 60-40, it's simple to do the mental math for win% and total number of matches, so the cleaner presentation is sufficient. If it was 47-36, the mental math to get win % is more of a burden on the reader, especially if there are 100+ cells in the table.

The 2nd is cleaner, but the 1st gives easier access to different information. It depends which are of more interest. But I agree it should be done in a way without information loss.

Originally Posted by dte

That you would settle on a given probability threshold, likely 90% or 95% is simply discretizing the confidence, which is dependent of the sample size, which you see from W-L in a pseudo continuous fashion.

It's also establishing a consistent benchmark for all cells, based on a fixed probability threshold instead of the difference between W and L. Otherwise this is not intuitive looking at W-L with different numbers of matches played in each cell.

**dte** · 11-24-2021 02:33 PM

Originally Posted by FTW

I presented both because I think players may have different opinions on this when the numbers get messier. For 60-40, it's simple to do the mental math for win% and total number of matches, so the cleaner presentation is sufficient. If it was 47-36, the mental math to get win % is more of a burden on the reader, especially if there are 100+ cells in the table.

I find 47-36 perfectly clear, but that some would find a win% easier to read is a valid point indeed.

**Zoid** · 11-24-2021 04:19 PM

Originally Posted by Reeplcheep

If you are a really a physicist you should know the definition of null. You always assume no difference and try to prove otherwise. From thermodynamics you should also now the principle of informational entropy; the null has to be the same as assigning labels at random. Otherwise you would conclude snow basics d&t having a 25-24 head to head winrate vs non-snow D&T is telling you something.

H_0 is in the vast majority of case p=1/# of options, (in this case 2, W or L)

Originally Posted by FourDogsinaHorseSuit

It's basic statistics that you test your results against flipping a coin.
A coin will predict the correct outcome 50% of the time, and if you say that a MU is 75/25 you're claiming you can predict it 75% of the time.

I still don't know why you're so stuck up on hypothesis testing.
There is no reason to assume anything.
You just present the data and that's it.

What I was initially was suggesting was how to give an uncertainty to the win rates.
Here we either take the frequentist approach or use Bayesian statistics where we need a prior.
That's where you can start to assume things which need to be well motivated and it depends on what you want to show.

**FTW** · 11-24-2021 04:33 PM

Originally Posted by Zoid

I still don't know why you're so stuck up on hypothesis testing.
There is no reason to assume anything.
You just present the data and that's it.

There's no reason not to present the data and also test it, showing both. If some don't trust the testing, they can ignore that part, but for those who do they are given more rather than less.

Whether you use hypothesis testing or Bayesian methods, both make similar assumptions (prior or null). 50% is reasonable because players tend to classify matchups as:
Favorable
Even
Unfavorable

A null of 50% allows you to do that. A null of 35% could tell you your deck has >35% win rate against Delver, but that's not how most players want to think about their matchup info, at least not before knowing if it's favorable or not, so the result of that test has less practical value. You can always test different nulls afterwards. 50% makes sense as a starting point.

This is a 2 player 0-sum game with a lot of chance. If neither player has an edge from the deck construction, you expect 50-50 odds by default. If your data contradict that, it tells you one deck is favored over the other.
(Player skill is a more relevant factor if you include LGS weeklies with a lot of new players, but if this is ripped from top tournament results then most players are good at their deck)

**FourDogsinaHorseSuit** · 11-24-2021 04:51 PM

Originally Posted by Zoid

I still don't know why you're so stuck up on hypothesis testing.
There is no reason to assume anything.
You just present the data and that's it.

What I was initially was suggesting was how to give an uncertainty to the win rates.
Here we either take the frequentist approach or use Bayesian statistics where we need a prior.
That's where you can start to assume things which need to be well motivated and it depends on what you want to show.

Amazing

**ParkerLewis** · 11-24-2021 05:39 PM

Seriously, this is pretty dumb.

On one hand, replacing W & L numbers by the best mle estimator of the winrate %age (pmle=W/(W+L)) + a second value to represent uncertainty (like width of the 95% confidence interval for pmle, or the quasi-std sqrt(pmle*(1-pmle)/n) (*)) doesn't reduce the available information, as from those two values, you can reconstruct both W & L.

On the other hand, you don't need to assume anything to establish those. There is no hypothesis to make or test against. It's simple mle.

(*) I'm saying quasi-std as this is improper ; the only actual std is sqrt(p*(1-p)/n) where p is the actual value of the parameter. But :
- this doesn't change the fact that this allows the reconstruction of original W & L numbers if one so desires,
- this still does quite adequately match expectations / will properly represent what the standard deviation of the process is, a) given that real matchups never go outside 0.2-0.8 for p, and b) as long as you don't go out of your way to use it wrong, ie if you have like only 5 matches.

View Poll Results: Most bannable card in Legacy? (not that they will touch it)

Thread: All B/R update speculation.

Thread Tools

Display

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Re: All B/R update speculation.

Thread Information

Users Browsing this Thread