Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are initial random numbers similar when using similar seeds?

Tags:

java

random

I discovered something strange with the generation of random numbers using Java's Random class. Basically, if you create multiple Random objects using close seeds (for example between 1 and 1000) the first value generated by each generator will be almost the same, but the next values looks fine (i didn't search further).

Here are the two first generated doubles with seeds from 0 to 9 :

  • 0 0.730967787376657 0.24053641567148587
  • 1 0.7308781907032909 0.41008081149220166
  • 2 0.7311469360199058 0.9014476240300544
  • 3 0.731057369148862 0.07099203475193139
  • 4 0.7306094602878371 0.9187140138555101
  • 5 0.730519863614471 0.08825840967622589
  • 6 0.7307886238322471 0.5796252073129174
  • 7 0.7306990420600421 0.7491696031336331
  • 8 0.7302511331990172 0.5968915822372118
  • 9 0.7301615514268123 0.7664359929590888

And from 991 to 1000 :

  • 991 0.7142160704801332 0.9453385235522973
  • 992 0.7109015598097105 0.21848118381994108
  • 993 0.7108119780375055 0.38802559454181795
  • 994 0.7110807233541204 0.8793923921785096
  • 995 0.7109911564830766 0.048936787999225295
  • 996 0.7105432327208906 0.896658767102804
  • 997 0.7104536509486856 0.0662031629235198
  • 998 0.7107223962653005 0.5575699754613725
  • 999 0.7106328293942568 0.7271143712820883
  • 1000 0.7101849056320707 0.574836350385667

And here is a figure showing the first value generated with seeds from 0 to 100,000.

First random double generated based on the seed :

image

I searched for information about this, but I didn't see anything referring to this precise problem. I know that there is many issues with LCGs algorithms, but I didn't know about this one, and I was wondering if this was a known issue.

And also, do you know if this problem only for the first value (or first few values), or if it is more general and using close seeds should be avoided?

Thanks.

like image 680
Jeg Avatar asked Sep 05 '12 13:09

Jeg


2 Answers

You'd be best served by downloading and reading the Random source, as well as some papers on pseudo-random generators, but here are some of the relevant parts of the source. To begin with, there are three constant parameters that control the algorithm:

private final static long multiplier = 0x5DEECE66DL; private final static long addend = 0xBL; private final static long mask = (1L << 48) - 1; 

The multiplier works out to approximately 2^34 and change, the mask 2^48 - 1, and the addend is pretty close to 0 for this analysis.

When you create a Random with a seed, the constructor calls setSeed:

synchronized public void setSeed(long seed) {     seed = (seed ^ multiplier) & mask;     this.seed.set(seed);     haveNextNextGaussian = false; } 

You're providing a seed pretty close to zero, so initial seed value that gets set is dominated by multiplier when the two are OR'ed together. In all your test cases with seeds close to zero, the seed that is used internally is roughly 2^34; but it's easy to see that even if you provided very large seed numbers, similar user-provided seeds will yield similar internal seeds.

The final piece is the next(int) method, which actually generates a random integer of the requested length based on the current seed, and then updates the seed:

protected int next(int bits) {     long oldseed, nextseed;     AtomicLong seed = this.seed;     do {     oldseed = seed.get();     nextseed = (oldseed * multiplier + addend) & mask;     } while (!seed.compareAndSet(oldseed, nextseed));     return (int)(nextseed >>> (48 - bits)); } 

This is called a 'linear congruential' pseudo-random generator, meaning that it generates each successive seed by multiplying the current seed by a constant multiplier and then adding a constant addend (and then masking to take the lower 48 bits, in this case). The quality of the generator is determined by the choice of multiplier and addend, but the ouput from all such generators can be easily predicted based on the current input and has a set period before it repeats itself (hence the recommendation not to use them in sensitive applications).

The reason you're seeing similar initial output from nextDouble given similar seeds is that, because the computation of the next integer only involves a multiplication and addition, the magnitude of the next integer is not much affected by differences in the lower bits. Calculation of the next double involves computing a large integer based on the seed and dividing it by another (constant) large integer, and the magnitude of the result is mostly affected by the magnitude of the integer.

Repeated calculations of the next seed will magnify the differences in the lower bits of the seed because of the repeated multiplication by the constant multiplier, and because the 48-bit mask throws out the highest bits each time, until eventually you see what looks like an even spread.

like image 144
Alex Avatar answered Oct 15 '22 00:10

Alex


I wouldn't have called this an "issue".

And also, do you know if this problem only for the first value (or first few values), or if it is more general and using close seeds should be avoided?

Correlation patterns between successive numbers is a common problem with non-crypto PRNGs, and this is just one manifestation. The correlation (strictly auto-correlation) is inherent in the mathematics underlying the algorithm(s). If you want to understand that, you should probably start by reading the relevant part of Knuth's Art of Computer Programming Chapter 3.

If you need non-predictability you should use a (true) random seed for Random ... or let the system pick a "pretty random" one for you; e.g. using the no-args constructor. Or better still, use a real random number source or a crypto-quality PRNG instead of Random.


For the record:

  1. The javadoc (Java 7) does not specify how Random() seeds itself.
  2. The implementation of Random() on Java 7 for Linux, is seeded from the nanosecond clock, XORed with a 'uniquifier' sequence. The 'uniquifier' sequence is LCG which uses different multiplier, and whose state is static. This is intended to avoid auto-correlation of the seeds ...
like image 31
Stephen C Avatar answered Oct 15 '22 01:10

Stephen C