I've been looking into the int rand()
function from <stdlib.h>
in C11 when I stumbled over the following cppreference-example for rolling a six sided die.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void)
{
srand(time(NULL)); // use current time as seed for random generator
int random_variable = rand();
printf("Random value on [0,%d]: %d\n", RAND_MAX, random_variable);
// roll a 6-sided die 20 times
for (int n=0; n != 20; ++n) {
int x = 7;
while(x > 6)
x = 1 + rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
printf("%d ", x);
}
}
Specifically this part:
[...]
while(x > 6)
x = 1 + rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
[...]
Questions:
Why the addition of + 1u
? Since rand()
is [0,RAND_MAX]
I'm guessing
that doing rand()/(RAND_MAX/6) -> [0,RAND_MAX/(RAND_MAX/6)] -> [0,6]
? And
since it's integer division (LARGE/(LARGE+small)) < 1 -> 0
, adding 1u
gives it the required range of [0,5]
?
Building on the previous question, assuming [0,5]
, 1 + (rand()/((RAND_MAX+1u)/6))
should only go through [1,6]
and never trigger a second loop?
Been poking around to see if rand()
has returned float
at some point, but
that seems like a pretty huge breakage towards old code? I guess the check
makes sense if you add 1.0f
instead of 1u
making it a floating point
division?
Trying to wrap my head around this, have a feeling that I might be missing something..
(P.s. This is not a basis for anything security critical, I'm just exploring the standard library. D.s)
The code avoids bias by ensuring each possible result in [1, 6] is the output from exactly the same number of return values from rand
.
By definition, rand
returns int
values from 0 to RAND_MAX
. So there are 1+RAND_MAX
possible values it can return. If 1+RAND_MAX
is not a multiple of 6, then it is impossible to partition it into 6 exactly equal intervals of integers. So the code partitions it into 6 equal intervals that are as big as possible and one odd-size fragment interval. Then the results of rand
are mapped into these intervals: The first six intervals correspond to results from 1 to 6, and the last interval is rejected, and the code tries again.
When we divide 1+RAND_MAX
by 6, there is some quotient q and some remainder r. Now consider the result of rand() / q
:
rand
produces a number in [0, q−1], rand() / q
will be 0.rand
produces a number in [q, 2q−1], rand() / q
will be 1.rand
produces a number in [2q, 3q−1], rand() / q
will be 2.rand
produces a number in [3q, 4q−1], rand() / q
will be 3.rand
produces a number in [4q, 5q−1], rand() / q
will be 4.rand
produces a number in [5q, 6q−1], rand() / q
will be 5.rand
produces a number that is 6q or greater, rand() / q
will be 6.Observe that in each of the first six intervals, there are exactly q numbers. In the seventh interval, the possible return values are in [6q, RAND_MAX
]. That interval contains r numbers.
This code works by rejecting that last interval:
int x = 7;
while(x > 6)
x = 1 + rand()/((RAND_MAX + 1u)/6);
Whenever rand
produces a number in that last fragmentary interval, this code rejects it and tries again. When rand
produces a number in one of the whole intervals, this code accepts it and exits (after adding 1 so the results in x
are 1 to 6 instead of 0 to 5).
Thus, every output from 1 to 6, inclusive, is mapped to from an exactly equal number of rand
values.
This is the best way to produce a uniform distribution from rand
in the sense that it has the fewest rejections, given we are using a scheme like this.1 The range of rand
has been split into six intervals that are as big as possible. The remaining fragmentary interval cannot be used because the remainder r is less than six, so the r unused values cannot be split evenly over the six desired values for x
.
1 This is not necessarily the best way to use rand
to generate random numbers in [1, 6] overall. For example, from a single rand
call with RAND_MAX
equal to 32767, we could view the value as a base-six numeral from 000000 to 411411. If it is under 400000, we can take the last five digits, which are each uniformly distributed in [0, 5], and adding one gts us the desired [1, 6]. If it is in [400000, 410000), we can use the last four digits. If it is in [410000, 411000), we can use the last three, and so on. Additionally, the otherwise discarded information, such as the leading digit, might be pooled over multiple rand
calls to increase the average number of outputs we get per call to rand
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With