Attempted problem: The probability that one of two dice will have a higher value than a third die.
Problem: For some reason, when I use the random module from python (specifically the sample method), I end up with a different (and incorrect) result from when when I use numpy. I've included the results at the bottom. Repeated execution of the code yields similar results. Any ideas, why the random.sample method and the numpy.random.random_integers have different results even though they have the same function? 
import numpy as np                                                              
import random                                                                   
random_list = []                                                                
numpy_list = []                                                                 
n= 500                                                                          
np_wins = 0                                                                     
rand_wins = 0                                                                   
for i in range(n):                                                              
    rolls = random.sample(range(1,7), 3)                                        
    rand_wins += any(rolls[0] < roll for roll in rolls)                         
    rolls = np.random.random_integers(1, 6, 3)                                  
    np_wins += any(rolls[0] < roll for roll in rolls)                           
print "numpy : {}".format(np_wins/(n * 1.0))                                    
print "random : {}".format(rand_wins/(n * 1.0))           
Result:
Press ENTER or type command to continue
numpy : 0.586
random : 0.688
                The reason for the observed difference is that random.sample samples without replacement (see here), while numpy.random.random_integers samples with replacement.
random.sample() prevents double values.  It is like drawing numbers without replacing them, so a result like [ 1, 1, 1 ] will never occur.
np.random.random_integers() on the other hand is what you really want if you simulate three die rolls.
You can replace your random.sample() by sth like [ random.randint(1, 6) for _ in range(3) ] to achieve the same result.
Two problems here (one minor, one significant):
Your sample size is very small to get a good result. If I do only 500 rolls, I get a result between 0.55 and 0.62. Hardly accurate.
random.sample picks 3 items without putting them back from the given sequence. So you're not doing three dice rolls, you're picking three distinct numbers from the range [1, 6].
In fact, if I do that, the probability is 67 %, whereas for the problem you stated it's more around 58 %, as you observed.
PowerShell test code I used:
Original problem statement:
(1..500 | %{
   $r = 0..2 | %{ Get-Random -min 1 -max 7 }
   !!($r|?{$r[0] -lt $_})
} | measure -ave).Average
Your flawed method:
(1..500 | %{
   $r = 1..6 | Get-Random 3
   !!($r|?{$r[0] -lt $_})
} | measure -ave).Average
Those yield the same result difference you observed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With