Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird biased random number generation with randomR

Tags:

haskell

I assume I am doing something wrong here:

ghci> take 10 $ randomRs (1,6) (mkStdGen 2)
[6,4,1,5,4,2,2,2,2,3]
ghci> take 10 $ randomRs (1,6) (mkStdGen 3)
[6,4,5,4,4,2,1,1,5,1]
ghci> take 10 $ randomRs (1,6) (mkStdGen 5)
[6,2,2,1,3,2,5,1,5,4]
ghci> take 10 $ randomRs (1,6) (mkStdGen 7)
[6,1,4,5,3,2,3,6,6,6]
ghci> take 10 $ randomRs (1,6) (mkStdGen 11)
[6,4,4,6,1,2,6,5,6,5]

Why is the first random number always "randomly" 6 ... ?

Same pattern with "n" for letters from a to z:

ghci> take 10 $ randomRs ('a','z') (mkStdGen 13)
"nnofwbxbtw"
ghci> take 10 $ randomRs ('a','z') (mkStdGen 17)
"novkmtfugl"
ghci> take 10 $ randomRs ('a','z') (mkStdGen 19)
"nhurafjvey"

I stumbled upon this working through LYAHFGG (chapter 9 on randomness) and really fail to make sense of it. I would expect no pattern across seeds.

like image 613
Raffael Avatar asked Jan 09 '23 02:01

Raffael


2 Answers

System.Random.StdGen is a horrible generator that should have been replaced long ago. There are even quickcheck properties that are obviously false which still pass after thousands of checks because of the poor quality of StdGen - and that is done with good seeding.

There are numerous strong generators both of a cryptographic variety and a statistical. Consider using the tf-random package for one proposed replacement to StdGen, DRBG for a cryptographic random generator, or mersenne-random-pure64 for a fast and statistically good generator.

like image 124
Thomas M. DuBuisson Avatar answered Jan 26 '23 03:01

Thomas M. DuBuisson


Those seeds are all very small numbers, as in, much smaller than maxBound :: Int. They are thus very much not a representative sample of real-world internal states for the random generator. And basically, the first result of randomR is only "as random as the seed", WRT the respective ranges.

If you want variation in the first result, you'll need to properly hash the seed, so the values are decently scattered across the permitted integer range. The easiest way to do that is to discard the first result.

like image 22
leftaroundabout Avatar answered Jan 26 '23 04:01

leftaroundabout