Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distribution of Random Numbers

I have two options of code:

Option 1

int myFunc() {
  return new Random().nextInt();
}

Or:

Option 2

private static final Random random = new Random();

int myFunc() {
  return random.nextInt();
}

I understand that option 2 is more idiomatic. I am wondering about the validity of option 1.

In option 1 I will only ever use the first number generated by a given seed. In option 2 I choose a seed and generate n numbers using that seed. IIUC the guarantees on randomness are on this use case.

My question is, therefore, if I call option 1 many times are there any guarantees on the uniformity of the distribution of the output?

like image 776
Benjy Kessler Avatar asked Jul 07 '16 10:07

Benjy Kessler


3 Answers

Quick Code:

// For occasional tasks that just need an average quality random number
ExecutorService threadPool = Executors.newCachedThreadPool();
threadPool.execute( () -> {
  ThreadLocalRandom.current().nextInt(); // Fast and unique!
} );


// For SecureRandom, high quality random number
final Random r = new SecureRandom();
ExecutorService threadPool = Executors.newCachedThreadPool();
threadPool.execute( () -> {
  r.nextInt(); // sun.security.provider.NativePRNG uses singleton.  Can't dodge contention.
} );


// Apache Common Math - Mersenne Twister - decent and non-singleton
int cpu = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool( cpu );
Map<Thread, RandomGenerator> random = new WeakHashMap<>( cpu, 1.0f );

executor.execute( ()-> {
   RandomGenerator r;
   synchronized ( random ) { // Get or create generator.
      r = random.get( Thread.currentThread() );
      if ( r == null ) random.put( Thread.currentThread(), r = new MersenneTwister() );
   }
   r.nextInt( 1000 );
} );

Explanation:

  1. Two Random of same seed will yield same numbers.
    1. Thus we will focus on whether we can guarantee different seeds.
  2. In theory, new Random() in each thread does not guarantee different seed.

    1. new Random is seeded by nanoTime and a "unique" number.
    2. The number is not guaranteed unique because its calculation is not synchronized.
    3. As for nanoTime, it guarantees to be "at least at good as currentTimeMillis"
    4. currentTimeMillis does not guarantee anything and can be pretty coarse.
    5. In real life, the two times are the same only on old linux systems and Win 98.
  3. In practice, new Random() in each thread basically always get different seeds.

    1. Creating thread is expensive. Mine creates 1 per 50,000ns. And that's not slow.
    2. 50µs is way above nanoTime's common granularities of up to a few ten ns.
    3. The unique number calculation (1.2) is also fast, so getting same number is very rare.
    4. Use Executors to create a thread pool to avoid the heavy new thread overhead.
  4. zapl suggested ThreadLocalRandom.current().nextInt(). Great idea.

    1. It does not create new Random, but it is also a linear congruential generator.
    2. It generate a new random for each call thread as that thread's seed.
    3. It is built to be very fast in multi-thread. (See notes below.)
    4. It is statically seeded by SecureRandom, which produce better quality random numbers.
  5. "uniformally distributed" is just one small part of randomness tests.

    1. Random is somewhat uniform, and its result can be predicted given just two values.
    2. SecureRandom guarantees this won't happens. (i.e. cryptographically strong)
    3. There is no risk of seed collision if you create a new SecureRandom in each thread.
    4. But currently its source is single thread anyway, no parallel generation.
    5. For a good RNG that supports multi-thread, find external help like Apache Common's MT.

Note: Implementation details deduced from Java 8 source code. Future Java version may change; for example, ThreadLocalRandom is using sun.misc.Unsafe to store the seeds, which may be removed in Java 9 forcing ThreadLocalRandom to find a new way to work without contention.

like image 104
Sheepy Avatar answered Nov 10 '22 04:11

Sheepy


My real question is whether option 1 is mathematically valid.

Lets start with option 2. The random number generator used by java.util.Random is specified in the javadoc as follows:

The class uses a 48-bit seed, which is modified using a linear congruential formula. (See Donald Knuth, The Art of Computer Programming, Volume 2, Section 3.2.1.)

and there is more specific detail in the various methods' javadocs.

But the point is that we are using a sequence generated by a linear congruential formula, and such formulae have a significant degree of auto-correlation ... which could be problematic.

Now with option 1, you are using a different Random instance with a new seed each time, and applying one round of the LC formula. Thus you are getting a sequence of numbers that are likely to be autocorrelated with the seeds. However, the seeds are generated in different ways, depending on the Java version.

Java 6 does this:

 public Random() { this(++seedUniquifier + System.nanoTime()); }
 private static volatile long seedUniquifier = 8682522807148012L;

... which is not very random at all. If you created Random instances at a constant interval, the seeds are likely to be closely spaced, and therefore the sequence of random numbers produced by your option #1 are liable to be auto-correlated.

By contrast, Java 7 and 8 do this:

 public Random() {
     this(seedUniquifier() ^ System.nanoTime());
 }

 private static long seedUniquifier() {
     // L'Ecuyer, "Tables of Linear Congruential Generators of
     // Different Sizes and Good Lattice Structure", 1999
     for (;;) {
         long current = seedUniquifier.get();
         long next = current * 181783497276652981L;
         if (seedUniquifier.compareAndSet(current, next))
             return next;
     }
 }

 private static final AtomicLong seedUniquifier
     = new AtomicLong(8682522807148012L);

The sequence of seeds produced by the above are likely be a much better approximation to (true) randomness. That probably makes your option #1 superior to option #2.

The downside of your option #1 in Java 6 through 8 is that the System.nanoTime() probably call involves a system call. That is relatively expensive.


So the short answer is that it is Java version specific which of option #1 and option #2 produces better quality "random" numbers ... from a mathematical perspective.

In both cases, the distribution of numbers will be uniform over a large enough sample size, though I'm not sure it is meaningful to talk about probability distributions when the process is deterministic.

However, neither approach would be suitable as a "crypto strength" random number generator.

like image 45
Stephen C Avatar answered Nov 10 '22 02:11

Stephen C


No.

There are no guarantees on the properties of the distribution of numbers that will be produced by Option 1. As has been made clear in other answers, the implementation of the constructor for java.util.Random depends on the system time. Therefore, in order to make a guarantee on the properties of the distribution of numbers you get with Option 1, you would need to be able to make guarantees about the distribution of numbers produced by the calls your program makes to get the system time on any platform where the program will be running.

With option 2, however, there are mathematical guarantees that can be made about the distribution of numbers that will be produced during one execution of the program. With a linear congruential generator (the pseudorandom number generation algorithm used by java.util.Random) some of the properties of randomness are not quite as good as with other algorithms, but the distribution is guaranteed to be relatively uniform.

This does not necessarily mean that Option 1 cannot serve your purposes. It depends on what you are doing.

like image 45
Evan VanderZee Avatar answered Nov 10 '22 04:11

Evan VanderZee