I wrote a program that records how many times 2 fair dice need to be rolled to match the probabilities for each result that we should expect.
I think it works but I'm wondering if there's a more resource friendly way to solve this problem.
import random
expected = [0.0, 0.0, 0.028, 0.056, 0.083,
0.111, 0.139, 0.167, 0.139, 0.111,
0.083, 0.056, 0.028]
results = [0.0] * 13 # store our empirical results here
emp_percent = [0.0] * 13 # results / by count
count = 0.0 # how many times have we rolled the dice?
while True:
r = random.randrange(1,7) + random.randrange(1,7) # roll our die
count += 1
results[r] += 1
emp_percent = results[:]
for i in range(len(emp_percent)):
emp_percent[i] /= count
emp_percent[i] = round(emp_percent[i], 3)
if emp_percent == expected:
break
print(count)
print(emp_percent)
There are several problems here.
Firstly, there is no guarantee that this will ever terminate, nor is it particularly likely to terminate in a reasonable amount of time. Ignoring floating point arithmetic issues, this should only terminate when your numbers are distributed exactly right. But the law of large numbers does not guarantee this will ever happen. The law of large numbers works like this:
Notice that the initial bias is never counterbalanced. Rather, it is dwarfed by the rest of the results. This means the bias tends to zero, but it does not guarantee the bias actually vanishes in a finite number of trials. Indeed, it specifically predicts that progressively smaller amounts of bias will continue to exist indefinitely. So it would be entirely possible that this algorithm never terminates, because there's always that tiny bit of bias still hanging around, statistically insignificant, but still very much there.
That's bad enough, but you're also working with floating point, which has its own issues; in particular, floating point arithmetic violates lots of conventional rules of math because the computer keeps doing intermediate rounding to ensure the numbers continue to fit into memory, even if they are repeating (in base 2) or irrational. The fact that you are rounding the empirical percents to three decimal places doesn't actually fix this, because not all terminating decimals (base 10) are terminating binary values (base 2), so you may still find mismatches between your empirical and expected values. Instead of doing this:
if emp_percent == expected:
break
...you might try this (in Python 3.5+ only):
if all(map(math.is_close, emp_percent, expected)):
break
This solves both problems at once. By default, math.is_close() requires the values to be within (about) 9 decimal places of one another, so it inserts the necessary give for this algorithm to actually have a chance of working. Note that it does require special handling for comparisons involving zero, so you may need to tweak this code for your use case, like this:
is_close = functools.partial(math.is_close, abs_tol=1e-9)
if all(map(is_close, emp_percent, expected)):
break
math.is_close() also removes the need to round your empiricals, since it can do this approximation for you:
is_close = functools.partial(math.is_close, rel_tol=1e-3, abs_tol=1e-5)
if all(map(is_close, emp_percent, expected)):
break
If you really don't want these approximations, you will have to give up floating point and work with fractions exclusively. They produce exact results when divided by one another. However, you still have the problem that your algorithm is unlikely to terminate quickly (or perhaps at all), for the reasons discussed above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With