I have an implementation of a pseudo random number generator, specifically of George Marsaglia's XOR-Shift RNG. My implementation is here:
FastRandom.cs
It turns out that the first random sample is very closely correlated with the seed, which is fairly obvious if you take a look at the Reinitialise(int seed) method. This is bad. My proposed solution is to mix up the bits of the seed as follows:
_x = (uint)( (seed * 2147483647)
^ ((seed << 16 | seed >> 48) * 28111)
^ ((seed << 32 | seed >> 32) * 69001)
^ ((seed << 48 | seed >> 16) * 45083));
So I have significantly weakened any correlation by multiplying the seed's bits with four primes and XORing back to form _x. I also rotate the seed's bits before multiplication to ensure that bits of varying magnitudes get mixed up across the full range of values for a 32 bit value.
The four-way rotation just seemed liked a nice balance between doing nothing and every possible rotation (32). The primes are 'finger in the air' - enough magnitude and bit structure to jumble up the bits and 'spread' them over the full 32 bits regardless of the starting seed.
Should I use bigger primes? Is there a standard approach to this problem, perhaps with a more formal basis? I am trying to do this with minimal CPU overhead.
Thanks
=== UPDATE ===
I decided to use some primes with set bits better distributed across all 32 bits. The result is that I can omit the shifts as the multiplications achieve the same effect (hashing bits across the full range of 32 bits), so I then just add the four products to give the final seed...
_x = (uint)( (seed * 1431655781)
+ (seed * 1183186591)
+ (seed * 622729787)
+ (seed * 338294347));
I could possibly get away with fewer primes/multiplciations. Two seemed too few (I could still see patterns in the first samples), three looked OK, so for a safety margin I made it four.
=== UPDATE 2 ===
FYI the above reduces to the functionally equivalent:
_x = seed * 3575866506U;
I didn't spot this initially and when I did I was wondering if overflowing at different stages in the calculation would cause a different result. I believe the answer is no - the two calculations always give the same answer.
According to some researchers, CrapWow, Crap8 and Murmur3 are the best non-cryptographic hash algorithms available today that are both fast, simple and statistically good.
More information is available at Non-Cryptographic Hash Function Zoo.
Edit: As of May, 2021 the floodberry.com links to the Non-Cryptographic Hash Function Zoo are not valid. The content can still be found on archive.org.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With