I want to collect the "best" way to generate random numbers on all four types of intervals in one place. I'm sick of Googling this. Search results turn up a lot of crap. Even the relevant results are pages or blogs that are often flat-out wrong or have discussions where self-appointed experts disagree with each other over some technicality, often with their "answers" seemingly exposing that they do not know about the different types (closed, open, semi-open) of intervals. I'm sick of reading bad information about generating random numbers in C for such a "simple" question. Please show me how to generate uniformly distributed floating point numbers. Here is my typical way (using "long double" as an example) on (a,b), [a,b), (a,b], and [a,b]: <pre class="prettyprint"><code>long double a=VALUE1,b=VALUE2; long double x1,x2,x3,x4; srand((unsigned)time(NULL)); /* x1 will be an element of [a,b] */ x1=((long double)rand()/RAND_MAX)*(b-a) + a; /* x2 will be an element of [a,b) */ x2=((long double)rand()/((long double)RAND_MAX+1))*(b-a) + a; /* x3 will be an element of (a,b] */ x3=(((long double)rand()+1)/((long double)RAND_MAX+1))*(b-a) + a; /* x4 will be an element of (a,b) */ x4=(((long double)rand()+1)/((long double)RAND_MAX+2))*(b-a) + a; </code></pre> For the special case of the unit intervals (0,1), [0,1), (0,1], and [0,1]: <pre class="prettyprint"><code>long double x1,x2,x3,x4; srand((unsigned)time(NULL)); /* x1 will be an element of [0,1] */ x1=((long double)rand()/RAND_MAX); /* x2 will be an element of [0,1) */ x2=((long double)rand()/((long double)RAND_MAX+1)); /* x3 will be an element of (0,1] */ x3=(((long double)rand()+1)/((long double)RAND_MAX+1)); /* x4 will be an element of (0,1) */ x4=(((long double)rand()+1)/((long double)RAND_MAX+2)); </code></pre> I believe the casts on both RAND_MAX and the return value of rand() are necessary, not only because we want to avoid integer division but because they are ints and otherwise adding one (or two) might overflow them. I think that versions for "double" and "float" are exactly the same but just replacing the type. Are there any subtleties that arise for the different floating point types? Do you see any problems with the above implementations? If so, what and how would you fix it? EDIT: The above implementations pass necessary tests for them to be correct (at least on a 64-bit Intel Core 2 Duo machine running 64-bit Linux): x1 can generate both 0 and 1, x2 can generate 0 but hasn't been seen to generate 1, x3 can generate 1 but hasn't been seen to generate 0, and x4 hasn't been seen to generate either 0 or 1.

If you want every double in the range to be possible, with probability proportional to the difference between it and its adjacent double values, then it's actually really hard. Consider the range <code>[0, 1000]</code>. There are an absolute bucketload of values in the very tiny first part of the range: a million of them between <code>0</code> and <code>1000000*DBL_MIN</code>, and <code>DBL_MIN</code> is about 2 * 10-308. There are more than <code>2^32</code> values in the range altogether, so clearly one call to <code>rand()</code> isn't enough to generate them all. What you'd need to do is generate the mantissa of your double uniformly, and select an exponent with an exponential distribution, and then fudge things a bit to ensure the result is in range. If you don't require every double in the range to be possible, then the difference between open and closed ranges is fairly irrelevant, because in a "true" continuous uniform random distribution, the probability of any exact value occurring is 0 anyway. So you might as well just generate a number in the open range. All that said: yes, your proposed implementations generate values that are in the ranges you say, and for the closed and half-closed ranges they generate the end-points with probability <code>1/(RAND_MAX+1)</code> or so. That's good enough for many or most practical purposes. Your fiddling around with +1 and +2 works provided that <code>RAND_MAX+2</code> is within the range that <code>double</code> can represent exactly. This is true for IEEE double precision and 32 bit <code>int</code>, but it's not actually guaranteed by the C standard. (I'm ignoring your use of <code>long double</code> because it confuses things a bit. It's guaranteed to be at least as big as <code>double</code>, but there are common implementations in which it's exactly the same as <code>double</code>, so the <code>long</code> doesn't add anything except uncertainty).

Use rand() to generate uniformly distributed floating point numbers on (a,b), [a,b), (a,b], and [a,b]

Tags:

c

random

I want to collect the "best" way to generate random numbers on all four types of intervals in one place. I'm sick of Googling this. Search results turn up a lot of crap. Even the relevant results are pages or blogs that are often flat-out wrong or have discussions where self-appointed experts disagree with each other over some technicality, often with their "answers" seemingly exposing that they do not know about the different types (closed, open, semi-open) of intervals. I'm sick of reading bad information about generating random numbers in C for such a "simple" question.

Please show me how to generate uniformly distributed floating point numbers. Here is my typical way (using "long double" as an example) on (a,b), [a,b), (a,b], and [a,b]:

long double a=VALUE1,b=VALUE2;
long double x1,x2,x3,x4;

srand((unsigned)time(NULL));

/* x1 will be an element of [a,b] */
x1=((long double)rand()/RAND_MAX)*(b-a) + a;

/* x2 will be an element of [a,b) */
x2=((long double)rand()/((long double)RAND_MAX+1))*(b-a) + a;

/* x3 will be an element of (a,b] */
x3=(((long double)rand()+1)/((long double)RAND_MAX+1))*(b-a) + a;

/* x4 will be an element of (a,b) */    
x4=(((long double)rand()+1)/((long double)RAND_MAX+2))*(b-a) + a;

For the special case of the unit intervals (0,1), [0,1), (0,1], and [0,1]:

long double x1,x2,x3,x4;

srand((unsigned)time(NULL));

/* x1 will be an element of [0,1] */
x1=((long double)rand()/RAND_MAX);

/* x2 will be an element of [0,1) */
x2=((long double)rand()/((long double)RAND_MAX+1));

/* x3 will be an element of (0,1] */
x3=(((long double)rand()+1)/((long double)RAND_MAX+1));

/* x4 will be an element of (0,1) */    
x4=(((long double)rand()+1)/((long double)RAND_MAX+2));

I believe the casts on both RAND_MAX and the return value of rand() are necessary, not only because we want to avoid integer division but because they are ints and otherwise adding one (or two) might overflow them.

I think that versions for "double" and "float" are exactly the same but just replacing the type. Are there any subtleties that arise for the different floating point types?

Do you see any problems with the above implementations? If so, what and how would you fix it?

EDIT: The above implementations pass necessary tests for them to be correct (at least on a 64-bit Intel Core 2 Duo machine running 64-bit Linux): x1 can generate both 0 and 1, x2 can generate 0 but hasn't been seen to generate 1, x3 can generate 1 but hasn't been seen to generate 0, and x4 hasn't been seen to generate either 0 or 1.

927

asked Sep 07 '12 18:09

SO Stinks

2 Answers

If you want every double in the range to be possible, with probability proportional to the difference between it and its adjacent double values, then it's actually really hard.

Consider the range [0, 1000]. There are an absolute bucketload of values in the very tiny first part of the range: a million of them between 0 and 1000000*DBL_MIN, and DBL_MIN is about 2 * 10^-308. There are more than 2^32 values in the range altogether, so clearly one call to rand() isn't enough to generate them all. What you'd need to do is generate the mantissa of your double uniformly, and select an exponent with an exponential distribution, and then fudge things a bit to ensure the result is in range.

If you don't require every double in the range to be possible, then the difference between open and closed ranges is fairly irrelevant, because in a "true" continuous uniform random distribution, the probability of any exact value occurring is 0 anyway. So you might as well just generate a number in the open range.

All that said: yes, your proposed implementations generate values that are in the ranges you say, and for the closed and half-closed ranges they generate the end-points with probability 1/(RAND_MAX+1) or so. That's good enough for many or most practical purposes.

Your fiddling around with +1 and +2 works provided that RAND_MAX+2 is within the range that double can represent exactly. This is true for IEEE double precision and 32 bit int, but it's not actually guaranteed by the C standard.

(I'm ignoring your use of long double because it confuses things a bit. It's guaranteed to be at least as big as double, but there are common implementations in which it's exactly the same as double, so the long doesn't add anything except uncertainty).

answered Nov 15 '22 08:11

Steve Jessop

This question is not ready for answering because the problem has been incompletely specified. In particular, no specification has been stated for how finely the set of values that can be generated should be distributed. For illustration, consider generating values for [0, 1], and consider a floating-point format with representable values:

0, 1/16, 2/16, 3/16, 4/16, 6/16, 8/16, 12/16, 1.

Several distributions over these values might be considered “uniform”:

Select each with equal probability. This is uniform over the discrete values but does not have a uniform density over the real distances between the values.
Select each with some probability proportional to the density of representable values in its vicinity.
Select 0, 4/16, 8/16, 12/16, and 1 with equal probability, to maintain the same granularity over the interval.

I doubt the first of these was intended, and I will dismiss it. The second is similar to a suggestion by Steve Jessop, but it is still incompletely specified. Should 0 be selected with a probability proportional to the interval from it to the midpoint to the next point? (This would give a probability of 1/32.) Or should it be associated with an interval centered on it, from -1/32 to 1/32? (This would give it a probability of 1/17, presuming 1 were also allocated an interval extended 1/32 beyond itself.)

You might reason that this is a closed interval, so it should stop at 0 and at 1. But suppose we had, for some application, chopped a distribution over [0, 2] into the intervals [0, 1] and (1, 2]. We would want the union of distributions over the latter two intervals to equal the distribution over the former interval. So our distributions ought to mesh nicely.

The third case has similar issues. Perhaps, if we wish to preserve granularity like this, 0 should be selected with probability 1/8, the three points 1/4, 1/2, and 3/4 with probability 1/4 each, and 1 with probability 1/8.

Aside from these issues of specifying the desired properties of the generators, the code proposed by the questioner has some issues:

Presuming that RAND_MAX+1 is a power of two (and thus dividing by it is “nice” in binary floating-point arithmetic), dividing by RAND_MAX or RAND_MAX+2 may cause some irregularities in the generated values. There may be odd quantizations in them.
When 1/(RAND_MAX+1) ≤ 1/4 ULP(1), RAND_MAX/(RAND_MAX+1) will round up and return 1 when it should not because the interval is [0, 1). (“ULP(1)” means the unit of least precision for the value 1 in the float-point format being used.) (This will not have been observed in tests with long double where RAND_MAX fits within the bits of the significand, but it will occur, for example, where RAND_MAX is 2147483647 and the floating-point type is float, with its 24-bit significand.)
Multiplying by (b-a) and adding a introduces rounding errors, the consequences of which must be evaluated. There are a number of cases, such as when b-a is small and a is large, when a and b straddle zero (thus causing loss of granularity near b even though finer results are representable), and so on.
The lower bound of the results for (0, 1) is the floating-point value nearest 1/(RAND_MAX+2). This bound has no relationship to the fineness of the floating-point values or the desired distribution; it is simply an artifact of the implementation of rand. Values in (0, 1/(RAND_MAX+2)) are omitted without any cause stemming from the problem specification. A similar artifact may exist on the upper end (depending on the particular floating-point format, the rand implementation, and the interval endpoint, b).

I submit the reason the questioner encountered unsatisfying answers for this “simple” problem is that it is not a simple problem.

answered Nov 15 '22 06:11

Eric Postpischil

Related questions
                            
                                gcc -O2 vs. without causes error
                            
                                Supplying two arguments to command line option using getopt [duplicate]
                            
                                Use JAR file in c++/c [duplicate]
                            
                                Linking issue with "multiple definition of" compilation error
                            
                                Why does calling mmap() with large size not fail?
                            
                                correct way to run setuid programs in C
                            
                                CRC checksum using long (64 bit)
                            
                                Is OpenGL backwards compatible with OpenGL ES?
                            
                                Bind a web server to port 80 without being root
                            
                                Division and floating points
                            
                                What is the difference between %*c%c and %c as a format specifier to scanf?
                            
                                variable argument type in va_arg function in c
                            
                                Order of storage inside a structure / object
                            
                                Can we use System IPC in android NDK code
                            
                                Defined argument evaluation order leads to sub-optimal code?
                            
                                Why is the output different?
                            
                                How does this method count the number of 1s in binary representation? [duplicate]
                            
                                thoughts on unions in C, with regards to MISRA
                            
                                Two function declarations with void and empty argument list
                            
                                Padded structures using __attribute__(__packed__), is it really worth it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With