Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

<stdlib.h> rand() example code, unnecessary check for larger than max?

Tags:

c

random

c11

I've been looking into the int rand() function from <stdlib.h> in C11 when I stumbled over the following cppreference-example for rolling a six sided die.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
 
int main(void)
{
    srand(time(NULL)); // use current time as seed for random generator
    int random_variable = rand();
    printf("Random value on [0,%d]: %d\n", RAND_MAX, random_variable);
 
    // roll a 6-sided die 20 times
    for (int n=0; n != 20; ++n) {
        int x = 7;
        while(x > 6) 
            x = 1 + rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
        printf("%d ",  x); 
    }
}

Specifically this part:

[...]
        while(x > 6) 
            x = 1 + rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
[...]

Questions:

  1. Why the addition of + 1u? Since rand() is [0,RAND_MAX] I'm guessing that doing rand()/(RAND_MAX/6) -> [0,RAND_MAX/(RAND_MAX/6)] -> [0,6]? And since it's integer division (LARGE/(LARGE+small)) < 1 -> 0, adding 1u gives it the required range of [0,5]?

  2. Building on the previous question, assuming [0,5], 1 + (rand()/((RAND_MAX+1u)/6)) should only go through [1,6] and never trigger a second loop?

Been poking around to see if rand() has returned float at some point, but that seems like a pretty huge breakage towards old code? I guess the check makes sense if you add 1.0f instead of 1u making it a floating point division?

Trying to wrap my head around this, have a feeling that I might be missing something..

(P.s. This is not a basis for anything security critical, I'm just exploring the standard library. D.s)

like image 603
GlassShark Avatar asked Oct 10 '19 11:10

GlassShark


1 Answers

The code avoids bias by ensuring each possible result in [1, 6] is the output from exactly the same number of return values from rand.

By definition, rand returns int values from 0 to RAND_MAX. So there are 1+RAND_MAX possible values it can return. If 1+RAND_MAX is not a multiple of 6, then it is impossible to partition it into 6 exactly equal intervals of integers. So the code partitions it into 6 equal intervals that are as big as possible and one odd-size fragment interval. Then the results of rand are mapped into these intervals: The first six intervals correspond to results from 1 to 6, and the last interval is rejected, and the code tries again.

When we divide 1+RAND_MAX by 6, there is some quotient q and some remainder r. Now consider the result of rand() / q:

  • When rand produces a number in [0, q−1], rand() / q will be 0.
  • When rand produces a number in [q, 2q−1], rand() / q will be 1.
  • When rand produces a number in [2q, 3q−1], rand() / q will be 2.
  • When rand produces a number in [3q, 4q−1], rand() / q will be 3.
  • When rand produces a number in [4q, 5q−1], rand() / q will be 4.
  • When rand produces a number in [5q, 6q−1], rand() / q will be 5.
  • When rand produces a number that is 6q or greater, rand() / q will be 6.

Observe that in each of the first six intervals, there are exactly q numbers. In the seventh interval, the possible return values are in [6q, RAND_MAX]. That interval contains r numbers.

This code works by rejecting that last interval:

int x = 7;
while(x > 6) 
    x = 1 + rand()/((RAND_MAX + 1u)/6);

Whenever rand produces a number in that last fragmentary interval, this code rejects it and tries again. When rand produces a number in one of the whole intervals, this code accepts it and exits (after adding 1 so the results in x are 1 to 6 instead of 0 to 5).

Thus, every output from 1 to 6, inclusive, is mapped to from an exactly equal number of rand values.

This is the best way to produce a uniform distribution from rand in the sense that it has the fewest rejections, given we are using a scheme like this.1 The range of rand has been split into six intervals that are as big as possible. The remaining fragmentary interval cannot be used because the remainder r is less than six, so the r unused values cannot be split evenly over the six desired values for x.

Footnote

1 This is not necessarily the best way to use rand to generate random numbers in [1, 6] overall. For example, from a single rand call with RAND_MAX equal to 32767, we could view the value as a base-six numeral from 000000 to 411411. If it is under 400000, we can take the last five digits, which are each uniformly distributed in [0, 5], and adding one gts us the desired [1, 6]. If it is in [400000, 410000), we can use the last four digits. If it is in [410000, 411000), we can use the last three, and so on. Additionally, the otherwise discarded information, such as the leading digit, might be pooled over multiple rand calls to increase the average number of outputs we get per call to rand.

like image 134
Eric Postpischil Avatar answered Nov 13 '22 22:11

Eric Postpischil