This question concerns the origins of temporal correlations one observes with System.Random
when one generates successive randoms from successive seeds (where one discards the same number of generators for each seed).
In Using mkStdGen from System.Random to generate random booleans Answer 1 and Using mkStdGen from System.Random to generate random booleans Answer 2 it was suggested (based on the reddit article referenced theirin) that the first few generators should be discarded in order to get sensible results. However I find that irrespective of how many generators one discards, when one looks at the temporal aspect of the distribution one obtains undesirable results if successive random numbers are generated with successive seeds (with one discarding the same number of generators for each seed).
My question is what is it about the algorithm employed in System.Random
that results in this temporal correlation between different seeds in the manner described?
If we generate an infinite sequence of random booleans, then the probability P(n)
of getting n
successive booleans which are of the same value (e.g. the [True,True,True]
in [False,True,True,True,False]
) is (1/2)^n
. As a
quick check we have the normalisation condition :
P(1)+P(2)+....P(infty) = (1/2) + (1/2)^2 + ... = 1
The following code :
module Main where
import Data.List
import System.Random
generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
where newGen = snd $ ((random startGen) :: (Bool,StdGen))
better_mkStdGen generation seed =
generateNthGenerator (mkStdGen seed) generation
randomNums generation =
map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False]
sortedLengthOfConsecutives num randList =
sort $ map length $ take num $ group randList
frequencyOfConsecutives sortedLengthOfCons =
map (\x -> (head x, length x)) $ group sortedLengthOfCons
results = frequencyOfConsecutives
$ sortedLengthOfConsecutives 10000
$ randomNums 10
main = do
print results -- [(8,1493),(9,8507)]
generates each successive boolean using generators from the consecutive seed, discarding 10 generators before using the resulting random result. A sequence of 10000 random numbers is generated, and so we expect roughly 5000 booleans to be followed by the opposite boolean (e.g. [True]
in [False,True,False,False]
), for there to be 2500 booleans which are followed by the same boolean but then followed by the opposed boolean (e.g. [True,True]
in [False,True,True,False]
), about 1250 booleans which group into 3s, etc.
So from the code above we get 1493 8-groups and 8507 9-groups. This is not what is expected, and we get similar results irrespective of how many generators are discarded (in the example above the number of generators discarded for each seed is 10). [Note when we do the same experiment with tf-random
we don't get the same behaviour, see later on].
If we instead generate successive booleans using the previously generated generator (which is I guess the fashion in which it was originally designed to be used, since random
itself returns a new generator), we seem to get more reasonable results :
module Main where
import Data.List
import System.Random
generateRandomInner gen =
let (randBool, newGen) = (random gen)::(Bool,StdGen)
in randBool:(generateRandomInner newGen)
generateRandoms seed =
let (randBool, newGen) = (random $ mkStdGen seed)::(Bool,StdGen)
in randBool:(generateRandomInner newGen)
seed = 0
randomNums = generateRandoms seed
sortedLengthOfConsecutives num randList =
sort $ map length $ take num $ group randList
frequencyOfConsecutives sortedLengthOfCons =
map (\x -> (head x, length x)) $ group sortedLengthOfCons
results = frequencyOfConsecutives
$ sortedLengthOfConsecutives 10000
$ randomNums
main = do
print results
--[(1,4935),(2,2513),(3,1273),(4,663),(5,308),
-- (6,141),(7,86),(8,45),(9,16),(10,12),(11,6),
-- (12,1),(13,1)]
So we get 4935 singletons (roughly equals 0.5*10000), 2513 duples (roughly equals 0.5^2*10000), 1273 triples (roughly equals 0.5^3*10000) etc, which is what we expect.
So it seems that if we generate (via System.Random
) a random sequence where each successive random is generated with the successive seed, where we discard the same number of generators for each seed, an undesirable correlation persists irrespective number of generators discarded.
What is it about the algorithmic properties of the random number generation of
System.Random
that result in this?
Note that if we employ the failing method above with tf-random
(redditt article) then we get the expected results :
module Main where
import Data.List
import System.Random
import System.Random.TF
generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
where newGen = snd $ ((random startGen) :: (Bool,TFGen))
better_mkStdGen generation seed =
generateNthGenerator (seedTFGen (0,0,0,seed)) generation
randomNums generation =
map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False]
sortedLengthOfConsecutives num randList =
sort $ map length $ take num $ group randList
frequencyOfConsecutives sortedLengthOfCons =
map (\x -> (head x, length x)) $ group sortedLengthOfCons
results = frequencyOfConsecutives
$ sortedLengthOfConsecutives 10000
$ randomNums 10
main = do
print results
-- [(1,4867),(2,2573),(3,1243),(4,646),(5,329),
-- (6,176),(7,80),(8,43),(9,26),(10,10),(11,4),
-- (12,2),(19,1)]
i.e. 50% are singletons, 25% are duples (groups of 2), etc...
Let's begin by looking at what the code says, and we can get there.
First off, random
as applied to Bool
is equivalent to:
randomB :: StdGen -> (Bool, StdGen)
randomB g = let (i, g') = next g in (i `mod` 2 == 1, g')
In fact, if I replace random
with randomB
where that's appropriate in your program, I get identical results. The point is that for determining booleans, all we care about is whether the next Int
value is even or odd.
Next, let's look at the definition of StdGen
:
data StdGen
= StdGen Int32 Int32
So two Int32
s are the state. Let's see how they're initialized with mkStdGen
and how they're adjusted with next
:
mkStdGen :: Int -> StdGen -- why not Integer ?
mkStdGen s = mkStdGen32 $ fromIntegral s
mkStdGen32 :: Int32 -> StdGen
mkStdGen32 s
| s < 0 = mkStdGen32 (-s)
| otherwise = StdGen (s1+1) (s2+1)
where
(q, s1) = s `divMod` 2147483562
s2 = q `mod` 2147483398
...
stdNext :: StdGen -> (Int, StdGen)
-- Returns values in the range stdRange
stdNext (StdGen s1 s2) = (fromIntegral z', StdGen s1'' s2'')
where z' = if z < 1 then z + 2147483562 else z
z = s1'' - s2''
k = s1 `quot` 53668
s1' = 40014 * (s1 - k * 53668) - k * 12211
s1'' = if s1' < 0 then s1' + 2147483563 else s1'
k' = s2 `quot` 52774
s2' = 40692 * (s2 - k' * 52774) - k' * 3791
s2'' = if s2' < 0 then s2' + 2147483399 else s2'
Note two interesting things:
The way s2
is initialized guarantees that it will be 1 unless you send a very high number to mkStdGen
, in which case it will be 2 (there are fewer than 200 values in the Int32
range that will initialize s2
to 2.
The two halves of the state are kept distinct - the updated s2
depends only on the previous s2
, not on the previous s1
, and vice versa.
As a consequence, if you examine the generators that you get out of a certain fixed number of generations passed to better_mkStdGen
, the second half of their state will always be identical.
Try it by adding this to your program:
print $ map (dropWhile (/= ' ') . show . better_mkStdGen 10) [0 .. 20]
So then the question is, why the mixing function in s1
fails to mix the last bit properly. Note that the way it's written, s1'
and k
will have the same parity as s1
, so the s1
state only has different parity from the previous s1
state if s1'
ends up being less than zero.
At this point I need to handwave a bit and say that the way s1'
is computed means that if two initial values for s1
are close to each other, the two values for s1'
will also be close together, and in general will only be 40014 times as far apart as they were initially, which in the range we allow for s1
makes neighboring values quite likely to end up on the same side of zero.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With