PDA

View Full Version : Weird MWS behavior put to test



DrJones
08-29-2009, 05:43 PM
I was testing some small changes to my kavu/false cure deck and found that changes that should be irrelevant to the deck behavior (like replacing putrid leeches by tarmogoyf) were having a huge impact in consistency (I started getting manascrewed and manaflooded many more times than usual), which was weird. I asked my brother and he told me that maybe the shuffler was using the decklist content as part of the "random seeder", and/or maybe it ordered alfabetically the decklist before shuffling, and lands were ending in a different stack of the deck which caused much more mana-screws and mana-floods after a "random cut".

I was puzzled, so I devised the following test. I would play my deck as it was before the changes, and then I would make the changes to the decklist, but the new cards would act as proxies of the old cards. Because both decks are exactly the same, there shouldn't be statistical differences after 100 goldfishes each, should it? :wink:

Surprise! the proxied test (http://www.mtgthesource.com/forums/showpost.php?p=375877&postcount=135) had about 50% and 90% more "bad hands" than the original decklist (http://www.mtgthesource.com/forums/showpost.php?p=375917&postcount=138), and autolosses were 4 times as frequent.

Any comments? Can someone else pass this test to find if I did something wrong?

Otter
08-29-2009, 08:09 PM
I've been suspecting something like this for a while. I realize that anecdotal evidence isn't very useful in statistical analysis, but I've had situations where I'm trying to test something very specific (i.e. what's the difference between having a Spell Snare in my opening hand vs ___ deck, or having a Thoughtseize in the slot instead?). A few times I have swapped out the one card for the other to do the second half of the testing and the had to draw 30+ hands (of 7 cards each) before I get the one of the four Thoughtseizes in my opening grip. That seems extremely statistically unlikely.

I'd definitely be interested in trying to test this stuff out.

beastman
08-29-2009, 09:00 PM
Everyone knows the MWS shuffler is shit. That's why nobody uses it for real testing. Most people use it strictly to kill time. You can do this experiment if you want, but it sounds like a waste of time to me.

pi4meterftw
08-29-2009, 09:58 PM
Everyone knows the MWS shuffler is shit. That's why nobody uses it for real testing. Most people use it strictly to kill time. You can do this experiment if you want, but it sounds like a waste of time to me.

Lol okay. It is true that the MWS shuffler isn't random in the usual sense. (only quantum mechanics is truly stochastic.)

But let's call it pseudorandom if it has, in particular, one property that random events also have. That is, let N model the number of times the event is repeated. The property is that there exists a real number r in [0, 1] such that Lim(N->infinity)(Sn/N)=r. Mind you this limit is in the sequence sense, if you happened to also be trained in math and are scrutinizing my words.

I won't take it all the way back to the axioms of math and philosophy explicitly, but I will denote that it's pretty sensible to think that most RGFs ('random' generating functions) are pseudorandom. I don't understand what more you expect. MWS isn't a small enough system to show quantum mechanics, so if pseudorandom isn't good enough, I can't imagine what you really wanted.

Also, another thing is that above, the limiting behavior only occurs as one, of course, actually takes a limit. Your 100 games doesn't show what you want it to show. In fact, the standard deviation at 100 games is 1/10 of what it would be if you played 1 game. (Provided you're taking an "average" consistency) But you could play 1 trillion games and still not have proven anything. The only way to prove your claims is to look at MWS coding, as my friend did, who said it was pseudorandom.

beastman
08-29-2009, 10:00 PM
I finally understand your name! Your one of those math nerds!:tongue:

MMogg
08-29-2009, 10:19 PM
Everyone knows the MWS shuffler is shit. That's why nobody uses it for real testing. Most people use it strictly to kill time. You can do this experiment if you want, but it sounds like a waste of time to me.

The problem is many of us haven't access to any kind of Magic environment, or for some who are a little more fortunate, don't have access to high-level Legacy testing environments. I know, sucks to be us. Booo hoo. I think that's why some people are a little upset of the disparity between real life shuffling and MWS poker-God-hands.

DrJones
08-30-2009, 02:46 AM
Also, another thing is that above, the limiting behavior only occurs as one, of course, actually takes a limit. Your 100 games doesn't show what you want it to show. In fact, the standard deviation at 100 games is 1/10 of what it would be if you played 1 game. (Provided you're taking an "average" consistency) But you could play 1 trillion games and still not have proven anything. The only way to prove your claims is to look at MWS coding, as my friend did, who said it was pseudorandom.That is appeal to authority, and gross over-simplification of what statistics represent. 100 tests might be few, but it's just enough to start seeing tendencies and those tendencies seem to corroborate initial expectation. You can argue that any finite number of tests can't prove anything because there will always be a ~0.0000001 chance of getting weird statistics after 1.000.000 tests over the same population, but the most likely situation is that it will fall well inside a really small interval close to the expected value. Statisticians never care about absolute certainty for that reason.

In fact, there's a statistical tool I have somewhere to which I could feed the data and it would tell me if it belongs to the same population with the same expected value and same standard deviation, which I think it's overkill, and wouldn't prove anything to you anyways because I didn't +1.000.000 sample hands. I'm not bothering to do a Ph.D thesis about the MWS shuffler either. I just saw something weird and tested it a bit before talking about it.

My suspicion is that MWS has either:
- A bug in the source of random numbers.
- A non-obvious bug your friend missed in the shuffling implementation.

However, I would discard:
- A flawed shuffling algorithm, because that one couldn't affect this test.

Otter
08-30-2009, 03:51 AM
I ran 100 tests on three very basic deck configurations. All of them came out to be pretty much the same and near what we would expect. However, it could possibly be a problem that only arises with actual (read: more complicated) decklists, so I'll try some tomorrow.

56 Mountains, 4 Lightning Bolts
60 hands: 0 Bolts
37 hands: 1 Bolt
3 hands: 2 Bolts

56 Mountains, 4 Sadistic Hypnotist
59 hands: 0 Hypnotist
34 hands: 1 Hypnotist
5 hands: 2 Hypnotist
2 hands: 3 Hypnotist

56 Mountains, 4 Akroma's Vengeance
50 hands: 0 Vengeance
43 hands: 1 Vengeance
7 hands: 2 Vengeance

DrJones
08-30-2009, 11:14 AM
I looked more into it, and found this thread on the MWS boards:

Pseudo-random number generation and usage in a shuffler (http://www.magi-soft.com/forum/viewtopic.php?f=11&t=2508)

The thread explains that the default source of random number generation used is not fair for decks over 12 cards. It explains the reasoning and suggests using Mersenne-Twister which is a really good random number generator (as long as the seed is good too). The main programmer answers and agrees to implement a new one on MWS 0.95.

The last MWS version is 0.94f, which could explain many things. There are still some questions left unanswered, because 0.94f seems posterior to that thread. MWS 0.95 also seems to be coded in LUA rather than Delphi, so maybe they couldn't use code from one project in the other.

ParkerLewis
08-30-2009, 12:18 PM
I don't see anything in your experiment to show any kind of default in the shuffler.

As already stated, 100 draws is not enough here to discriminate 32 occurences from 43, seeing how the probability of the "bad draw" is around 0.35 : standard deviation here is close from the maximum of 5 that would be attained if the probability was 2. Imagine the actual probability was 0.37 (the mean of the two tries), well, you got one try at -standard deviation, and a second try at +standard deviation...

...go on, there is nothing to see here.

Finally, the link you provided in your last post has few to do with this supposed clumping problem. Even if the latest versions still had the old algorithm, that would only mean that not all end-results are possible (which, true, is indeed bad in itself)... but at this point there is no reason to suspect that the remaining possible end-results would have more (or, for that matter, less) clumps.

DrJones
08-30-2009, 01:00 PM
Finally, the link you provided in your last post has few to do with this supposed clumping problem. Even if the latest versions still had the old algorithm, that would only mean that not all end-results are possible (which, true, is indeed bad in itself)... but at this point there is no reason to suspect that the remaining possible end-results would have more (or, for that matter, less) clumps.

Yes, it has. Let's say that the possible end-results favour certain positions over others, and suppose that the process always start from an ordered decklist. You could swap cards in the decklist so that some cards end in the favoured positions. Also, if you read further down that thread, the programmer asks for a reliable way to create a "seed", which means that MWS is likely using a faulty seed too.

I think that MWS has a calculator tool to which you can specify the number of starting hands and it calculates how many times you draw certain card combinations and/or lands. I'll give it a try next time I use it, just to get a reliable way to find the probability of a bad draw without having to play 10000 games. :wink:

ParkerLewis
08-30-2009, 01:35 PM
Yes, it has. Let's say that the possible end-results favour certain positions over others, and suppose that the process always start from an ordered decklist.

All is in the "let's say". My point is, based on the method used to shuffle, the "allowed" permutations have the same average number of clumps (that is, as long as you're not looking at the tenth decimal) than the total number of permutations do.

(Note : this would have to be confirmed, as this is based on what i remember from reading a few years ago when i got interested in the algorithm employed in mws. Still, at this point, there is strictly ZERO evidence for the old shuffler being biased on that aspect)


You could swap cards in the decklist so that some cards end in the favoured positions. Also, if you read further down that thread, the programmer asks for a reliable way to create a "seed", which means that MWS is likely using a faulty seed too.

It looks like you're not understanding the discussion in the link you provided then. It also looks like you have no clue what a seed is (it's no big deal, but in this case you should be careful before raising it here). In any case, the question referred to the way to generate a seed for the new Mersenne algorithm, which had more complex seed requirements than the old algorithm (if you wanted to use it "at its best").


I think that MWS has a calculator tool to which you can specify the number of starting hands and it calculates how many times you draw certain card combinations and/or lands. I'll give it a try next time I use it, just to get a reliable way to find the probability of a bad draw without having to play 10000 games. :wink:

It does, although it's limited to pretty simple calculus (combining the or & and options never worked for me).

Dark_Shakuras
09-05-2009, 01:02 PM
In computer science class, we always just used the time of the computer as the seed. Then you NEVER have the same seed twice...

(well unless you shuffled up on October 8th at 07:34:54 twice in a row...)