Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random.nextInt(int) is [slightly] biased

Tags:

java

random

Namely, it will never generate more than 16 even numbers in a row with some specific upperBound parameters:

Random random = new Random();

int c = 0;
int max = 17;
int upperBound = 18;

while (c <= max) {
    int nextInt = random.nextInt(upperBound);
    boolean even = nextInt % 2 == 0;
    if (even) {
        c++;
    } else {
        c = 0;
    }
}

In this example the code will loop forever, while when upperBound is, for example, 16, it terminates quickly.

What can be the reason of this behavior? There are some notes in the method's javadoc, but I failed to understand them.


UPD1: The code seems to terminate with odd upper bounds, but may stuck with even ones


UPD2: I modified the code to capture the statistics of c as suggested in the comments:

Random random = new Random();

int c = 0;
long trials = 1 << 58;
int max = 20;
int[] stat = new int[max + 1];

while (trials > 0) {
    while (c <= max && trials > 0) {
        int nextInt = random.nextInt(18);
        boolean even = nextInt % 2 == 0;
        if (even) {
            c++;
        } else {
            stat[c] = stat[c] + 1;
            c = 0;
        }
        trials--;
    }
}

System.out.println(Arrays.toString(stat));

Now it tries to reach 20 evens in the row - to get better statistics, and the upperBound is still 18.

The results turned out to be more than surprising:

[16776448, 8386560, 4195328, 2104576, 1044736, 
 518144, 264704, 132096, 68864, 29952, 15104, 
 12032, 1792, 3072, 256, 512, 0, 256, 0, 0]

At first it decreases as expected by the factor of 2, but note the last line! Here it goes crazy and the captured statistics seem to be completely weird.

Here is a bar plot in log scale:

c statistics

How c gets the value 17 256 times is yet another mystery

like image 428
Alexey Grigorev Avatar asked Jul 24 '13 10:07

Alexey Grigorev


People also ask

What is random nextint in Java?

Random.nextInt () Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive), drawn from this random number generator’s sequence. The upper bound on the random value to be generated.

What is the value of nextint(int N) in Java?

Random Integer value : -298063488 Random Integer value : 1400961289 The nextInt (int n) method of Random class returns a pseudorandom int value between zero (inclusive ) and the specified value (exclusive), drawn from the random number generator?s sequence. n: It is the bound on the random number to be returned. It must be positive.

How to get the next integer within a random number?

The upper bound on the random value to be generated. The method returns int value. In this example, we will create an object random of Random class type. We will call nextInt (bound) on this Random object to get the next integer value within the number, bound.

What is the syntax of nextint () method in Java?

The syntax of nextInt () method is The method returns int value. In this example, we will create an object random of Random class type. We will call nextInt () on this Random object to get the next integer value. We shall print it to console. Output may vary, since the integer value is generated randomly.


2 Answers

http://docs.oracle.com/javase/6/docs/api/java/util/Random.html:

An instance of this class is used to generate a stream of pseudorandom numbers. The class uses a 48-bit seed, which is modified using a linear congruential formula. (See Donald Knuth, The Art of Computer Programming, Volume 3, Section 3.2.1.)

If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. [...]

It is a pseudo-random number generator. This means that you are not actually rolling a dice but rather use a formula to calculate the next "random" value based on the current random value. To creat the illusion of randomisation a seed is used. The seed is the first value used with the formula to generate the random value.

Apparently javas random implementation (the "formula"), does not generate more than 16 even numbers in a row.

This behaviour is the reason why the seed is usually initialized with the time. Deepending on when you start your program you will get different results.

The benefits of this approach are that you can generate repeatable results. If you have a game generating "random" maps, you can remember the seed to regenerate the same map if you want to play it again, for instance.

For true random numbers some operating systems provide special devices that generate "randomness" from external events like mousemovements or network traffic. However i do not know how to tap into those with java.

From the Java doc for secureRandom:

Many SecureRandom implementations are in the form of a pseudo-random number generator (PRNG), which means they use a deterministic algorithm to produce a pseudo-random sequence from a true random seed. Other implementations may produce true random numbers, and yet others may use a combination of both techniques.

Note that secureRandom does NOT guarantee true random numbers either.

Why changing the seed does not help

Lets assume random numbers would only have the range 0-7. Now we use the following formula to generate the next "random" number:

 next = (current + 3) % 8

the sequence becomes 0 3 6 1 4 7 2 5.

If you now take the seed 3 all you do is to change the starting point.

In this simple implementation that only uses the previous value, every value may occur only once before the sequence wraps arround and starts again. Otherwise there would be an unreachable part.

E.g. imagine the sequence 0 3 6 1 3 4 7 2 5. The numbers 0,4,7,2 and 5 would never be generated more than once(deepending on the seed they might be generated never), since once the sequence loops 3,6,1,3,6,1,... .

Simplified pseudo random number generators can be thought of a permutation of all numbers in the range and you use the seed as a starting point. If they are more advanced you would have to replace the permutation with a list in which the same numbers might occur multiple times.

More complex generators can have an internal state, allowing the same number to occur several times in the sequence, since the state lets the generator know where to continue.

like image 147
ted Avatar answered Oct 03 '22 11:10

ted


The implementation of Random uses a simple linear congruential formula. Such formulae have a natural periodicity and all sorts of non-random patterns in the sequence they generate.

What you are seeing is an artefact of one of these patterns ... nothing deliberate. It is not an example of bias. Rather it is an example of auto-correlation.

If you need better (more "random") numbers, then you need to use SecureRandom rather than Random.

And the answer to "why was it implemented that way is" ... performance. A call to Random.nextInt can be completed in tens or hundreds of clock cycles. A call to SecureRandom is likely to be at least 2 orders of magnitude slower, possibly more.

like image 24
Stephen C Avatar answered Oct 03 '22 11:10

Stephen C