I implemented the Wald-Wolfowitz runs test but during testing I encountered weird behaviour, the steps I take are the following:
I take two samples out of the same distribution:
import numpy as np
list_dist_A = np.random.chisquare(2, 1000)
list_dist_B = np.random.chisquare(2, 1000)
I concatenate the two lists and sort them, while remembering which number came from which sample. The following function does that and it returns a list of labels ["A","B","A","A", ... "B"]
def _get_runs_list(list1, list2):
# Add labels
l1 = list(map(lambda x: (x, "A"), list1))
l2 = list(map(lambda x: (x, "B"), list2))
# Concatenate
lst = l1 + l2
# Sort
sorted_list = sorted(lst, key=lambda x: x[0])
# Return only the labels:
return [l[1] for l in sorted_list]
Now I want to calculate the number of runs (a consecutive sequence of identical labels). e.g.:
For this I use the following code:
def _calculate_nruns(labels):
nruns = 0
last_seen = None
for label in labels:
if label != last_seen:
nruns += 1
last_seen = label
return nruns
Since all elements are randomly drawn I thought that I should roughly end up with a sequence a,b,a,b,a,b... So this would mean that the number of runs is roughly 2000.
However as can be seen in this snippet on "repl.it" this is not the case, it is always roughly around 1000.
Can anyone explain why this is the case?
~1000 is the expected result. Following the Wikipedia article on this statistical test, you have Np = Nn = 1000 and N = Np + Nn = 2000. That means that the expected value for the number of runs is mu = 2 * Np * Nn / N + 1 which is 1001.
Oh, that reminds me of the Gambler's fallacy.
I'm not a statistician but to get 2000 runs you would need a 100% chance that A follows B and B follows A. This would indicate that the PRNG has some sort of memory of previous draws. That wouldn't be good...
OTOH, assume you have drawn a value labelled A, then there's a 50% chance to draw another A and a 50% chance to draw a B. So the chance to draw a length-one-run is actually only 50%, the chance to get a length-two run is 25%, for length-three it's 12.5%, for length-four it's 6.25 and so on.
The last part can easily be verified:
import numpy as np
list_dist_A = np.random.chisquare(2, 1000)
list_dist_B = np.random.chisquare(2, 1000)
listA = [(value, 'A') for value in list_dist_A]
listB = [(value, 'B') for value in list_dist_B]
combined = sorted(listA+listB, key=lambda x: x[0])
combined = [x[1] for x in combined]
from itertools import groupby
from collections import Counter
runlengths = [len(list(it)) for _, it in groupby(combined)] # lengths of the individual runs
print(Counter(runlengths)) # similar to a histogram
# Counter({1: 497, 2: 234, 3: 131, 4: 65, 5: 29, 6: 20, 7: 11, 8: 2, 10: 1, 14: 1})
So this is actually quite close to the expectation (which would be: 1: 500, 2: 250, 3: 125, 4:62, ... as mentioned above). If your assumption would be correct it would be more close to 1:2000, 2: 0, ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With