Attempted problem: The probability that one of two dice will have a higher value than a third die.
Problem: For some reason, when I use the random
module from python (specifically the sample method), I end up with a different (and incorrect) result from when when I use numpy. I've included the results at the bottom. Repeated execution of the code yields similar results. Any ideas, why the random.sample
method and the numpy.random.random_integers
have different results even though they have the same function?
import numpy as np
import random
random_list = []
numpy_list = []
n= 500
np_wins = 0
rand_wins = 0
for i in range(n):
rolls = random.sample(range(1,7), 3)
rand_wins += any(rolls[0] < roll for roll in rolls)
rolls = np.random.random_integers(1, 6, 3)
np_wins += any(rolls[0] < roll for roll in rolls)
print "numpy : {}".format(np_wins/(n * 1.0))
print "random : {}".format(rand_wins/(n * 1.0))
Result:
Press ENTER or type command to continue
numpy : 0.586
random : 0.688
The reason for the observed difference is that random.sample
samples without replacement (see here), while numpy.random.random_integers
samples with replacement.
random.sample()
prevents double values. It is like drawing numbers without replacing them, so a result like [ 1, 1, 1 ]
will never occur.
np.random.random_integers()
on the other hand is what you really want if you simulate three die rolls.
You can replace your random.sample()
by sth like [ random.randint(1, 6) for _ in range(3) ]
to achieve the same result.
Two problems here (one minor, one significant):
Your sample size is very small to get a good result. If I do only 500 rolls, I get a result between 0.55 and 0.62. Hardly accurate.
random.sample
picks 3 items without putting them back from the given sequence. So you're not doing three dice rolls, you're picking three distinct numbers from the range [1, 6].
In fact, if I do that, the probability is 67 %, whereas for the problem you stated it's more around 58 %, as you observed.
PowerShell test code I used:
Original problem statement:
(1..500 | %{
$r = 0..2 | %{ Get-Random -min 1 -max 7 }
!!($r|?{$r[0] -lt $_})
} | measure -ave).Average
Your flawed method:
(1..500 | %{
$r = 1..6 | Get-Random 3
!!($r|?{$r[0] -lt $_})
} | measure -ave).Average
Those yield the same result difference you observed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With