Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do people say there is modulo bias when using a random number generator?

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand() in C++.

like image 965
user1413793 Avatar asked Jun 11 '12 17:06

user1413793


People also ask

Do random number generators have a bias?

Instead, random-number generators that are often used to produce such sequences tend to cluster zeros together, introducing a bias.

Can a random number generator be manipulated?

With some random number generators, it's possible to select the seed carefully to manipulate the output. Sometimes this is easy to do. Sometimes it's hard but doable. Sometimes it's theoretically possible but practically impossible.

Can random number generators be predicted?

Surprisingly, the general-purpose random number generators that are in most widespread use are easily predicted. (In contrast RNGs used to construct stream ciphers for secure communication are believed to be infeasible to predict, and are known as cryptographically secure).

Why would you use a random number generator?

Random numbers are important for computer encryption, lotteries, scientific modelling, and gambling. Current methods of generating random numbers can produce predictable results. Researchers said the new method could generate higher-quality random numbers with less computer processing.


2 Answers

So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).

Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability!

When rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11

When rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11

When rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11

This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.

So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.

So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:

int x;  do {     x = rand(); } while (x >= n); 

but that's inefficient for low values of n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.

A more efficient formula approach would be to take some large range with a length divisible by n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:

int x;  do {     x = rand(); } while (x >= (RAND_MAX - RAND_MAX % n));  x %= n; 

For small values of n, this will rarely require more than one call to rand().


Works cited and further reading:

  • CPlusPlus Reference

  • Eternally Confuzzled


like image 146
user1413793 Avatar answered Sep 28 '22 05:09

user1413793


Keep selecting a random is a good way to remove the bias.

Update

We could make the code fast if we search for an x in range divisible by n.

// Assumptions // rand() in [0, RAND_MAX] // n in (0, RAND_MAX]  int x;   // Keep searching for an x in a range divisible by n  do {     x = rand(); } while (x >= RAND_MAX - (RAND_MAX % n))   x %= n; 

The above loop should be very fast, say 1 iteration on average.

like image 41
Nick Dandoulakis Avatar answered Sep 28 '22 05:09

Nick Dandoulakis