Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

uniformly distributed random number generation

Tags:

c++

math

Why does this code generates uniformly distributed numbers? I have some difficulties in understanding it. Could someone explain? Thanks.

int RandomUniform(int n) {  
  int top = ((((RAND_MAX - n) + 1) / n) * n - 1) + n;  
  int r;  
  do {  
    r = rand();  
  } while (r > top);  
  return (r % n);  
}

update: I do understand why rand()%n doesn't give you a uniformly distributed sequence. My question is why the

top = ((((RAND_MAX - n) + 1) / n) * n - 1) + n;

What's the concern here? I think a simple top = RAND_MAX / n * n would do.

like image 901
JASON Avatar asked Feb 04 '13 15:02

JASON


People also ask

How do you generate a random number from a uniform distribution?

The inversion method relies on the principle that continuous cumulative distribution functions (cdfs) range uniformly over the open interval (0,1). If u is a uniform random number on (0,1), then x = F - 1 ( u ) generates a random number x from any continuous distribution with the specified cdf F .

What is a uniform random number generator?

The Uniform Random Number block generates uniformly distributed random numbers over a specifiable interval with a specifiable starting seed. The seed is reset each time a simulation starts. The generated sequence is repeatable and can be produced by any Uniform Random Number block with the same seed and parameters.

How do you generate uniformly distributed random numbers in Excel?

If you want to use RAND to generate a random number but don't want the numbers to change every time the cell is calculated, you can enter =RAND() in the formula bar, and then press F9 to change the formula to a random number. The formula will calculate and leave you with just a value.


2 Answers

The function assumes that rand() is uniformly distributed; whether or not that is a valid assumption depends on the implementation of rand().

Given a uniform rand(), we can get a random number in the range [0,n) by calculating rand()%n. However, in general, this won't be quite uniform. For example, suppose n is 3 and RAND_MAX is 7:

rand()      0 1 2 3 4 5 6 7
rand() % n  0 1 2 0 1 2 0 1

We can see that 0 and 1 come up with a probability of 3/8, while 2 only comes up with a probability of 2/8: the distribution is not uniform.

Your code discards any value of rand() greater or equal to the largest multiple of n that it can generate. Now each value has an equal probability:

rand()      0 1 2 3 4 5 6 7
rand() % n  0 1 2 0 1 2 X X

So 0,1 and 2 all come up with a probability of 1/3, as long as we are not so unlucky that the loop never terminates.

Regarding your update:

I think a simple top = RAND_MAX / n * n would do.

If RAND_MAX were an exclusive bound (one more than the actual maximum), then that would be correct. Since it's an inclusive bound, we need to add one to get the exclusive bound; and since the following logic compares with > against an inclusive bound, then subtract one again after the calculation:

int top = ((RAND_MAX + 1) / n) * n - 1;

However, if RAND_MAX were equal to INT_MAX, then the calculation would overflow; to avoid that, subtract n at the beginning of the calculation, and add it again at the end:

int top = (((RAND_MAX - n) + 1) / n) * n - 1 + n;
like image 199
Mike Seymour Avatar answered Sep 20 '22 01:09

Mike Seymour


The underlying problem is this: suppose you have a random number generator my_rand() that produces value from 0 to 6, inclusive, and you want to generate values from 0 to 5, inclusive; if you run your generator and return my_rand() % 6, you won't get a uniform distribution. When my_rand() returns 0, you get 0; when it returns 1, you get 1, etc. until my_rand() returns 6; in that case my_rand() % 6 is 0. So overall, my_rand() % 6 will return 0 twice as often as any other value. The way to fix this is to not use values greater than 5, that is, instead of my_rand() % 5 you write a loop and discard values from my_rand() that are too large. That's essentially what the code in the question is doing. I haven't traced it through, but the usual implementation is to compute the largest multiple of n that is less than or equal to RAND_MAX, and whenever rand() returns a value that's greater than that multiple, go back and get a new value.

like image 38
Pete Becker Avatar answered Sep 22 '22 01:09

Pete Becker