I am trying to find the statistical properties of the PRNGs available in Python (2.7.10) by using the frequency test, runs test and the chi squared test.
For carrying out the frequency test, I need to convert the generated random number to its binary representation and then count the distribution of 1
's and 0
's. I was experimenting with the binary representation of the random numbers on the python console and observed this weird behavior:
>>> for n in random.sample(xrange(1, sys.maxsize), 50): ... print '{0:b}'.format(n) ... 101101110011011001110011110110101101101101111111101000000000001 110000101001001011101001110111111110011000101011100010000000001 110111101101110011100010001010000101011111110010001110000000001 100001111010011000101001000001000011001111100000001010000000001 1111000010010011111100111110110100100011110111010000000000001 111000001011101011101110100001001001000011011001110110000000001 1000100111011000111000101010000101010100110111000100000000001 11101001000001101111110101111011001000100011011011010000000001 110011010111101101011000110011011001110001111000001010000000001 110110110110111100011111110111011111101000011001100000000001 100010010000011101011100110101011110111100001100100000000000001 10111100011010011010001000101011001110010010000010010000000001 101011100110110001010110000101100000111111011101011000000000001 1111110010110010000111111000010001101011011010101110000000001 11100010101101110110101000101101011011111101101000010000000001 10011110110110010110011010000110010010111001111001010000000001 110110011100111010100111100100000100011101100001100000000000001 100110011001101011110011010101111101100010000111001010000000001 111000101101100111110010110110100110111001000101000000000000001 111111101000010111001011111100111100011101001011010000000001 11110001111100000111010010011111010101101110111001010000000001 100001100101101100010101111100111101111001101010101010000000001 11101010110011000001101110000000001111010001110111000000000001 100111000110111010001110110101001011100101111101010000000001 100001101100000011101101010101111111011010111110111110000000001 100010010011110110111111111000010001101100111001001100000000001 110011111110010011000110101010101001001010000100011010000000001 1111011010100001001101101000011100001011001110010100000000001 110110011101100101001100111010101111001011111101100000000000001 1010001110100101001001011111000111011100001100000110000000001 1000101110010011011000001011010110001000110100100100000000001 11111110011001011100111110110111000001000100100010000000000001 101111101010000101010111111111000001100101111001011110000000001 10010010111111111100000001010010101100111001100000000000001 111110000001110010001110111101110101010110001110000000000000001 100000101101000110101010010000101101000011111010001110000000001 101001011101100011001000011010010000000111110111100010000000001 10110101010000111010110111001111011000001111001100110000000001 10110111100100100011100101001100000000101110100100010000000001 10010111110001011101001110000111011010110100110111110000000001 111011110010110111011011101011001100001000111001010100000000001 101001010001010100010010010001100111101110101111000110000000001 101011111010000101010101000110001101001001011110000000000001 1010001010111101101010111110110110000001111101101110000000001 10111111111010001000110000101101010101011010101100000000001 101011101010110000001111010100100110000011111100100100000000001 111100001101111010100111010001010010000010110110010110000000001 100111111000100110100001110101000010111111010010010000000000001 100111100001011100011000000000101100111111000111100110000000001 110110100000110111011101110101101000101110111111010110000000001 >>>
As you can see, all numbers end in 0000000001
, i.e all numbers are 1 mod 2^10
. Why is this so ?
Also, this behavior is observed when the range is 1 to sys.maxsize
. If the range is specified to be 1 to 2^40
, this is not observed. I want to know the reason for this behavior and whether there is anything wrong in my code.
The documentation for the random library that implements the PRNGs that I am using is here.
Let me know if I should provide any more information.
The randint() method Syntax Basically, the randint() method in Python returns a random integer value between the two lower and higher limits (including both limits) provided as two parameters. It should be noted that this method is only capable of generating integer-type random value.
@roeland hinted at the cause: in Python 2, sample()
uses int(random.random() * n)
repeatedly. Look at the source code (in your Python's Lib/random.py
) for full details. In short, random.random()
returns no more than 53 significant (non-zero) leading bits; then int()
fills the rest of the low-order bits with zeroes (you're obviously on a machine where sys.maxsize == 2**63 - 1
); then indexing your base (xrange(1, sys.maxsize)
) by an even integer with "a lot" of of low-order 0 bits always returns an odd integer with the same number of low-order 0 bits (except for the last).
In Python 3 none of that happens - random
in Python 3 uses stronger algorithms, and only falls back to random.random()
when necessary. For example, here under Python 3.4.3:
>>> hex(random.randrange(10**70)) '0x91fc11ed768be3a454bd66f593c218d8bbfa3b99f6285291e1d9f964a9' >>> hex(random.randrange(10**70)) '0x7b07ff02b6676801e33094fca2fcca7f6e235481c479c521643b1acaf4'
Here's a more directly relevant example, under 3.4.3 on a 64-bit box:
>>> import random, sys >>> sys.maxsize == 2**63 - 1 True >>> for i in random.sample(range(1, sys.maxsize), 6): ... print(bin(i)) 0b10001100101001001111110110011111000100110100111001100000010110 0b100111100110110100111101001100001100110001110010000101101000101 0b1100000001110000110100111101101010110001100110101111011100111 0b111110100001111100101001001001101101100100011001001010100001110 0b1100110100000011100010000011010010100100110111001111100110100 0b10011010000110101010101110001000101110111100100001111101110111
Python 3 doesn't invoke random.random()
at all in this case, but instead iteratively grabs chunks of 32 bits from the underlying Mersenne Twister (32-bit unsigned ints are "the natural" outputs from this implementation of MT) , pasting them together to build a suitable index. So, in Python 3, platform floats have nothing to do with it; in Python 2, quirks of float behavior have everything to do with it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With