Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a random number from 0 to N-1 with a "same probability" in C? [duplicate]

I know this might be a "old" question, but I want to focus on the probability.

My first question is: in C, rand() will give a number from 0 to RAND_MAX, does each number in this interval have the same probability to be chosen by rand()?

The second question: if rand() lets each number from 0 to RAND_MAX to have the same (or approximately same) probability to be chosen, when I want to get a random number from 0 to N-1 (N-1 < RAND_MAX), I'll do this generally:

rand()%N

But if RAND_MAX is NOT the multiple of N, the probability of random number chosen from 0 to N-1 might not be same

For instance, suppose RAND_MAX=150 and N=100, when I do rand()%100, the number from 0 to 49 will have a higher probability to be chosen than the number from 50 to 99 because 150 is not the multiple of 100.

Is there a algorithm or function in C, which can let each random number have the same probability to be chosen?

like image 883
Andwyer Avatar asked Oct 20 '22 15:10

Andwyer


2 Answers

Leaving aside the fact that rand()%N is a very bad way to get a random number in the range 0..N-1, the question you ask is quite simple. Find the largest number, M say, that satisfies both M <= RAND_MAX and M % N == 0. Then when you call rand(), reject a value if it is >= M and call rand() again until you get a value that is < M.

But this particular nuance is pointless because rand()%N will be hopelessly biased. You need to use as many of the bits returned by in rand() as you possibly can.

like image 185
David Heffernan Avatar answered Oct 27 '22 00:10

David Heffernan


Assuming rand() itself is evenly distributed (not always a good assumption.), call rand() again as needed.

Based on Trying to take 1 random digit at a time

#include <stdlib.h>
int rand_Flat_Distribution_0_to_Nm1(int N) {
  assert(N <= RAND_MAX);
  assert(N > 0);
  int rmax = RAND_MAX - (RAND_MAX % N) - 1;
  int r;
  while ((r = rand()) > rmax);
  return r%N;
}

Example: If RAND_MAX was 32767 and N was 100, rmax would have the value of 32699. Any random value in the range 32700 to 32767 would get tossed and a new random value would be fetched, eliminated the bias %N typically causes.

This does not compensate for deficiencies in rand(). C does not specify the quality of rand(), just that it generates values 0 to RAND_MAX and RAND_MAX is a least 32767.

For values of N greater than RAND_MAX, a different solution is needed.

like image 34
chux - Reinstate Monica Avatar answered Oct 27 '22 00:10

chux - Reinstate Monica