Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use rand() to generate uniformly distributed floating point numbers on (a,b), [a,b), (a,b], and [a,b]

Tags:

c

random

I want to collect the "best" way to generate random numbers on all four types of intervals in one place. I'm sick of Googling this. Search results turn up a lot of crap. Even the relevant results are pages or blogs that are often flat-out wrong or have discussions where self-appointed experts disagree with each other over some technicality, often with their "answers" seemingly exposing that they do not know about the different types (closed, open, semi-open) of intervals. I'm sick of reading bad information about generating random numbers in C for such a "simple" question.

Please show me how to generate uniformly distributed floating point numbers. Here is my typical way (using "long double" as an example) on (a,b), [a,b), (a,b], and [a,b]:

long double a=VALUE1,b=VALUE2;
long double x1,x2,x3,x4;

srand((unsigned)time(NULL));

/* x1 will be an element of [a,b] */
x1=((long double)rand()/RAND_MAX)*(b-a) + a;

/* x2 will be an element of [a,b) */
x2=((long double)rand()/((long double)RAND_MAX+1))*(b-a) + a;

/* x3 will be an element of (a,b] */
x3=(((long double)rand()+1)/((long double)RAND_MAX+1))*(b-a) + a;

/* x4 will be an element of (a,b) */    
x4=(((long double)rand()+1)/((long double)RAND_MAX+2))*(b-a) + a;

For the special case of the unit intervals (0,1), [0,1), (0,1], and [0,1]:

long double x1,x2,x3,x4;

srand((unsigned)time(NULL));

/* x1 will be an element of [0,1] */
x1=((long double)rand()/RAND_MAX);

/* x2 will be an element of [0,1) */
x2=((long double)rand()/((long double)RAND_MAX+1));

/* x3 will be an element of (0,1] */
x3=(((long double)rand()+1)/((long double)RAND_MAX+1));

/* x4 will be an element of (0,1) */    
x4=(((long double)rand()+1)/((long double)RAND_MAX+2));

I believe the casts on both RAND_MAX and the return value of rand() are necessary, not only because we want to avoid integer division but because they are ints and otherwise adding one (or two) might overflow them.

I think that versions for "double" and "float" are exactly the same but just replacing the type. Are there any subtleties that arise for the different floating point types?

Do you see any problems with the above implementations? If so, what and how would you fix it?

EDIT: The above implementations pass necessary tests for them to be correct (at least on a 64-bit Intel Core 2 Duo machine running 64-bit Linux): x1 can generate both 0 and 1, x2 can generate 0 but hasn't been seen to generate 1, x3 can generate 1 but hasn't been seen to generate 0, and x4 hasn't been seen to generate either 0 or 1.

like image 927
SO Stinks Avatar asked Sep 07 '12 18:09

SO Stinks


People also ask

Is rand () a uniform distribution?

X = rand( n ) returns an n -by- n matrix of uniformly distributed random numbers.

How do you generate uniformly distributed random numbers?

The Uniform Random Number block generates uniformly distributed random numbers over an interval that you specify. To generate normally distributed random numbers, use the Random Number block. Both blocks use the Normal (Gaussian) random number generator ( 'v4' : legacy MATLAB® 4.0 generator of the rng function).

Which of the following function in Numpy random module is used to generate uniformly distributed numbers from range 0 1 ]?

rand() function is used to generate random values in the range of [0,1) . The data points form an uniform distribution.


2 Answers

If you want every double in the range to be possible, with probability proportional to the difference between it and its adjacent double values, then it's actually really hard.

Consider the range [0, 1000]. There are an absolute bucketload of values in the very tiny first part of the range: a million of them between 0 and 1000000*DBL_MIN, and DBL_MIN is about 2 * 10-308. There are more than 2^32 values in the range altogether, so clearly one call to rand() isn't enough to generate them all. What you'd need to do is generate the mantissa of your double uniformly, and select an exponent with an exponential distribution, and then fudge things a bit to ensure the result is in range.

If you don't require every double in the range to be possible, then the difference between open and closed ranges is fairly irrelevant, because in a "true" continuous uniform random distribution, the probability of any exact value occurring is 0 anyway. So you might as well just generate a number in the open range.

All that said: yes, your proposed implementations generate values that are in the ranges you say, and for the closed and half-closed ranges they generate the end-points with probability 1/(RAND_MAX+1) or so. That's good enough for many or most practical purposes.

Your fiddling around with +1 and +2 works provided that RAND_MAX+2 is within the range that double can represent exactly. This is true for IEEE double precision and 32 bit int, but it's not actually guaranteed by the C standard.

(I'm ignoring your use of long double because it confuses things a bit. It's guaranteed to be at least as big as double, but there are common implementations in which it's exactly the same as double, so the long doesn't add anything except uncertainty).

like image 59
Steve Jessop Avatar answered Nov 15 '22 08:11

Steve Jessop


This question is not ready for answering because the problem has been incompletely specified. In particular, no specification has been stated for how finely the set of values that can be generated should be distributed. For illustration, consider generating values for [0, 1], and consider a floating-point format with representable values:

0, 1/16, 2/16, 3/16, 4/16, 6/16, 8/16, 12/16, 1.

Several distributions over these values might be considered “uniform”:

  • Select each with equal probability. This is uniform over the discrete values but does not have a uniform density over the real distances between the values.
  • Select each with some probability proportional to the density of representable values in its vicinity.
  • Select 0, 4/16, 8/16, 12/16, and 1 with equal probability, to maintain the same granularity over the interval.

I doubt the first of these was intended, and I will dismiss it. The second is similar to a suggestion by Steve Jessop, but it is still incompletely specified. Should 0 be selected with a probability proportional to the interval from it to the midpoint to the next point? (This would give a probability of 1/32.) Or should it be associated with an interval centered on it, from -1/32 to 1/32? (This would give it a probability of 1/17, presuming 1 were also allocated an interval extended 1/32 beyond itself.)

You might reason that this is a closed interval, so it should stop at 0 and at 1. But suppose we had, for some application, chopped a distribution over [0, 2] into the intervals [0, 1] and (1, 2]. We would want the union of distributions over the latter two intervals to equal the distribution over the former interval. So our distributions ought to mesh nicely.

The third case has similar issues. Perhaps, if we wish to preserve granularity like this, 0 should be selected with probability 1/8, the three points 1/4, 1/2, and 3/4 with probability 1/4 each, and 1 with probability 1/8.

Aside from these issues of specifying the desired properties of the generators, the code proposed by the questioner has some issues:

  • Presuming that RAND_MAX+1 is a power of two (and thus dividing by it is “nice” in binary floating-point arithmetic), dividing by RAND_MAX or RAND_MAX+2 may cause some irregularities in the generated values. There may be odd quantizations in them.

  • When 1/(RAND_MAX+1) ≤ 1/4 ULP(1), RAND_MAX/(RAND_MAX+1) will round up and return 1 when it should not because the interval is [0, 1). (“ULP(1)” means the unit of least precision for the value 1 in the float-point format being used.) (This will not have been observed in tests with long double where RAND_MAX fits within the bits of the significand, but it will occur, for example, where RAND_MAX is 2147483647 and the floating-point type is float, with its 24-bit significand.)

  • Multiplying by (b-a) and adding a introduces rounding errors, the consequences of which must be evaluated. There are a number of cases, such as when b-a is small and a is large, when a and b straddle zero (thus causing loss of granularity near b even though finer results are representable), and so on.

  • The lower bound of the results for (0, 1) is the floating-point value nearest 1/(RAND_MAX+2). This bound has no relationship to the fineness of the floating-point values or the desired distribution; it is simply an artifact of the implementation of rand. Values in (0, 1/(RAND_MAX+2)) are omitted without any cause stemming from the problem specification. A similar artifact may exist on the upper end (depending on the particular floating-point format, the rand implementation, and the interval endpoint, b).

I submit the reason the questioner encountered unsatisfying answers for this “simple” problem is that it is not a simple problem.

like image 33
Eric Postpischil Avatar answered Nov 15 '22 06:11

Eric Postpischil