I noticed while doing numerical simulations a pattern in my data when I use normal numbers in julia.
I have an ensemble of random matrices. In order to do my calculations reproducible, I set the srand
function per-realization. That is, each time I use the function randn(n,n)
I initialize it with srand(j)
, where j
is the number of the realization.
I would like to know how the normal numbers are generated, and if it has meaning that doing what I do, I introduce accidental correlations.
Ideally, not at all. If you have any counterexamples, please file them as bugs on the Julia issue tracker. Julia uses the state-of-the-art Mersenne Twister library, dSFMT. This library is very fast and has been considered to use best practices for pseudo-random number generation. It has, however, recently come to my attention that there may be subtle statistical issues with PRNGs like MT in general – in particular with using small, consecutive seed values. To mitigate this if you're really worried about potential correlations, you could do something like this:
julia> using SHA
julia> srand(reinterpret(UInt32,sha256(string(1))))
MersenneTwister(UInt32[0x73b2866b,0xe1fc34ff,0x4e806b9d,0x573f5aff,0xeaa4ad47,0x491d2fa2,0xdd521ec0,0x4b5b87b7],Base.dSFMT.DSFMT_state(Int32[660235548,1072895699,-1083634456,1073365654,-576407846,1073066249,1877594582,1072764549,-1511149919,1073191776 … -710638738,1073480641,-1040936331,1072742443,103117571,389938639,-499807753,414063872,382,0]),[1.5382,1.36616,1.06752,1.17428,1.93809,1.63529,1.74182,1.30015,1.54163,1.05408 … 1.67649,1.66725,1.62193,1.26964,1.37521,1.42057,1.79071,1.17269,1.37336,1.99576],382)
julia> srand(reinterpret(UInt32,sha256(string(2))))
MersenneTwister(UInt32[0x3a5e73d4,0xee165e26,0x71593fe0,0x035d9b8b,0xd8079c01,0x901fc5b6,0x6e663ada,0x35ab13ec],Base.dSFMT.DSFMT_state(Int32[-1908998566,1072999344,-843508968,1073279250,-1560550261,1073676797,1247353488,1073400397,1888738837,1073180516 … -450365168,1073182597,1421589101,1073360711,670806122,388309585,890220451,386049800,382,0]),[1.5382,1.36616,1.06752,1.17428,1.93809,1.63529,1.74182,1.30015,1.54163,1.05408 … 1.67649,1.66725,1.62193,1.26964,1.37521,1.42057,1.79071,1.17269,1.37336,1.99576],382)
In other words, hash a string representation of a small integer seed value using a strong cryptographic hash like SHA2-256, and use the resulting hash data to seed the Mersenne Twister state. Ottoboni, Rivest & Stark suggest using a strong cryptographic hash for each random number generation, but that's going to be a massive slowdown (on current hardware) and is probably overkill unless you have an application that is really very sensitive to imperfect statistical randomness.
I should perhaps point out that Julia's behavior here is not worse than other languages, some of which use far worse random number generators by default, due to backwards compatibility considerations. This is an very recent research result (not yet published even). The technique I've suggested could be used to mitigate this issue in other languages as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With