Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use "rand % N" or "rand() / (RAND_MAX / N + 1)"?

I was reading the C FAQ and found out in a question that it recommends me to use rand() / (RAND_MAX / N + 1) instead of the more popular way which is rand() % N.

The reasoning for that is that when N is a low number rand() % N will only use a few bits from rand().

I tested the different approaches with N being 2 on both Windows and Linux but could not notice a difference.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 2

int main(void)
{
    srand(0);
    printf("rand() %% N:\n");
    for (int i = 0; i < 40; ++i) {
        printf("%d ", rand() % N);
    }
    putchar('\n');

    srand(0);
    printf("rand() / (RAND_MAX / N + 1):\n");
    for (int i = 0; i < 40; ++i) {
        printf("%d ", rand() / (RAND_MAX / N + 1));
    }
    putchar('\n');

    return 0;
}

The output is this (on my gnu/linux machine):

rand() % N:
1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 
rand() / (RAND_MAX / N + 1):
1 0 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 0 1 

Both alternatives seem perfectly random to me. It even seems like the second approach is worse than rand % N.

Should I use rand() % N or rand() / (RAND_MAX / N + 1)?

like image 343
wefwefa3 Avatar asked Dec 30 '14 13:12

wefwefa3


1 Answers

If N is a power of two, using the remainder technique is usually safe (RAND_MAX is usually a power of two minus 1, so the entire range has a power of two length). More generally, N has to divide the range of rand() in order to avoid the bias.

Otherwise, you run into this problem, regardless of the quality of rand(). In short, the problem is that you're chopping that range into a number of "parts" each of length N, if N does not divide the range then the last part will not be complete. The numbers that got "cut off" from that part are therefore less likely to occur, since they have one fewer "part" they can be generated from.

Unfortunately rand() / (RAND_MAX / N + 1) is also broken (in almost the same way), so the real answer is: don't use either of them.

The problem as outlined above is really fundamental, there is no way to evenly distribute X different values over Y results unless Y divides X. You can fix it by rejecting a part of the random samples, to make Y divide the new X.

like image 194
harold Avatar answered Oct 04 '22 20:10

harold