I know this might be a "old" question, but I want to focus on the probability.
My first question is:
in C, rand()
will give a number from 0
to RAND_MAX
, does each number in this interval have the same probability to be chosen by rand()
?
The second question:
if rand()
lets each number from 0
to RAND_MAX
to have the same (or approximately same) probability to be chosen, when I want to get a random number from 0 to N-1 (N-1 < RAND_MAX), I'll do this generally:
rand()%N
But if RAND_MAX
is NOT the multiple of N, the probability of random number chosen from 0 to N-1 might not be same
For instance, suppose RAND_MAX=150 and N=100, when I do rand()%100
, the number from 0 to 49 will have a higher probability to be chosen than the number from 50 to 99 because 150 is not the multiple of 100.
Is there a algorithm or function in C, which can let each random number have the same probability to be chosen?
Leaving aside the fact that rand()%N
is a very bad way to get a random number in the range 0..N-1
, the question you ask is quite simple. Find the largest number, M say, that satisfies both M <= RAND_MAX
and M % N == 0
. Then when you call rand()
, reject a value if it is >= M
and call rand()
again until you get a value that is < M
.
But this particular nuance is pointless because rand()%N
will be hopelessly biased. You need to use as many of the bits returned by in rand()
as you possibly can.
Assuming rand()
itself is evenly distributed (not always a good assumption.), call rand()
again as needed.
Based on Trying to take 1 random digit at a time
#include <stdlib.h>
int rand_Flat_Distribution_0_to_Nm1(int N) {
assert(N <= RAND_MAX);
assert(N > 0);
int rmax = RAND_MAX - (RAND_MAX % N) - 1;
int r;
while ((r = rand()) > rmax);
return r%N;
}
Example:
If RAND_MAX
was 32767 and N
was 100, rmax
would have the value of 32699. Any random value in the range 32700 to 32767 would get tossed and a new random value would be fetched, eliminated the bias %N
typically causes.
This does not compensate for deficiencies in rand()
. C does not specify the quality of rand()
, just that it generates values 0
to RAND_MAX
and RAND_MAX
is a least 32767.
For values of N
greater than RAND_MAX
, a different solution is needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With