Is Pythons random.randint statistically random?

Question

So I'm testing an calculating the probabilities of certain dice rolls, for a game. The base case if that rolling one 10sided die.

I did a million samples of this, and ended up with the following proportions:

Result
0       0.000000000000000%
1       10.038789961210000%
2       10.043589956410000%
3       9.994890005110000%
4       10.025289974710000%
5       9.948090051909950%
6       9.965590034409970%
7       9.990190009809990%
8       9.985490014509990%
9       9.980390019609980%
10      10.027589972410000%

These should of course all be 10%. There is a standard deviation of 0.0323207% in these results. that, to me, seems rather high. Is it just coincidence? As I understand it the random module accesses proper pseudo-random numbers. Ie ones from a method that pass the statistical tests to be random. Or are these pseudo-pseudo-random number generators

Should I be using cryptographic pseudo-random number generators? I'm fairly sure I don't need a true random number generator (see http://www.random.org/, http://en.wikipedia.org/wiki/Hardware_random_number_generator).

I am currently regenerating all my results with 1 billion samples, (cos why not, I have a crunchy server at my disposal, and some sleep to do)

Martijn Pieters · Accepted Answer

From the random module documentation:

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

From the Wikipedia article on the Mersenne Twister:

It provides for fast generation of very high-quality pseudorandom numbers, having been designed specifically to rectify many of the flaws found in older algorithms.

If you have an OS-specific randomness source, available through os.urandom(), then you can use the random.SystemRandom() class instead. Most of the random module functions are available as methods on that class. It perhaps would be more suitable for cryptographic purposes, quoting the docs again:

The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation.

Python 3.6 adds a secrets module with convenience methods to produce random data suitable for cryptographic purposes:

The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.

In particularly, secrets should be used in preference to the default pseudo-random number generator in the random module, which is designed for modelling and simulation, not security or cryptography.

Steven Rumbalski · Answer

I reran the OP's exercise with one billion iterations:

from collections import Counter
import random
n = 1000000000
c = Counter(random.randint(1, 10) for _ in xrange(n))
for i in range(1,11):
    print '%2s  %02.10f%%' % (i, c[i] * 100.0 / n)

Here's the (reformatted) result:

 1     9.9996500000%
 2    10.0011089000%
 3    10.0008568000%
 4    10.0007495000%
 5     9.9999089000%
 6     9.9985344000%
 7     9.9994913000%
 8     9.9997877000%
 9    10.0010818000%
10     9.9988307000%

See the other answers to this question for their excellent analysis.

Is Pythons random.randint statistically random?

Tags:

python

random

statistics

probability

Lyndon White

2 Answers

Martijn Pieters

Steven Rumbalski

Recent Activity

Donate For Us

Is Pythons random.randint statistically random?

Tags:

python

random

statistics

probability

Lyndon White

2 Answers

Martijn Pieters

Steven Rumbalski

Related questions

Recent Activity

Donate For Us