What is the optimal algorithm for generating an unbiased random integer within a range?

Tags:

In this StackOverflow question:

Generating random integer from a range

the accepted answer suggests the following formula for generating a random integer in between given min and max, with min and max being included into the range:

output = min + (rand() % (int)(max - min + 1))

But it also says that

This is still slightly biased towards lower numbers ... It's also possible to extend it so that it removes the bias.

But it doesn't explain why it's biased towards lower numbers or how to remove the bias. So, the question is: is this the most optimal approach to generation of a random integer within a (signed) range while not relying on anything fancy, just rand() function, and in case if it is optimal, how to remove the bias?

EDIT:

I've just tested the while-loop algorithm suggested by @Joey against floating-point extrapolation:

static const double s_invRandMax = 1.0/((double)RAND_MAX + 1.0);
return min + (int)(((double)(max + 1 - min))*rand()*s_invRandMax);

to see how much uniformly "balls" are "falling" into and are being distributed among a number of "buckets", one test for the floating-point extrapolation and another for the while-loop algorithm. But results turned out to be varying depending on the number of "balls" (and "buckets") so I couldn't easily pick a winner. The working code can be found at this Ideone page. For example, with 10 buckets and 100 balls the maximum deviation from the ideal probability among buckets is less for the floating-point extrapolation than for the while-loop algorithm (0.04 and 0.05 respectively) but with 1000 balls, the maximum deviation of the while-loop algorithm is lesser (0.024 and 0.011), and with 10000 balls, the floating-point extrapolation is again doing better (0.0034 and 0.0053), and so on without much of consistency. Thinking of the possibility that none of the algorithms consistently produces uniform distribution better than that of the other algorithm, makes me lean towards the floating-point extrapolation since it appears to perform faster than the while-loop algorithm. So is it fine to choose the floating-point extrapolation algorithm or my testings/conclusions are not completely correct?

657

asked Aug 01 '12 12:08

Desmond Hume

1 Answers

The problem is that you're doing a modulo operation. This would be no problem if RAND_MAX would be evenly divisible by your modulus, but usually that is not the case. As a very contrived example, assume RAND_MAX to be 11 and your modulus to be 3. You'll get the following possible random numbers and the following resulting remainders:

0 1 2 3 4 5 6 7 8 9 10
0 1 2 0 1 2 0 1 2 0 1

As you can see, 0 and 1 are slightly more probable than 2.

One option to solve this is rejection sampling: By disallowing the numbers 9 and 10 above you can cause the resulting distribution to be uniform again. The tricky part is figuring out how to do so efficiently. A very nice example (one that took me two days to understand why it works) can be found in Java's java.util.Random.nextInt(int) method.

The reason why Java's algorithm is a little tricky is that they avoid slow operations like multiplication and division for the check. If you don't care too much you can also do it the naïve way:

int n = (int)(max - min + 1);
int remainder = RAND_MAX % n;
int x, output;
do {
  x = rand();
  output = x % n;
} while (x >= RAND_MAX - remainder);
return min + output;

EDIT: Corrected a fencepost error in above code, now it works as it should. I also created a little sample program (C#; taking a uniform PRNG for numbers between 0 and 15 and constructing a PRNG for numbers between 0 and 6 from it via various ways):

using System;

class Rand {
    static Random r = new Random();

    static int Rand16() {
        return r.Next(16);
    }

    static int Rand7Naive() {
        return Rand16() % 7;
    }

    static int Rand7Float() {
        return (int)(Rand16() / 16.0 * 7);
    }

    // corrected
    static int Rand7RejectionNaive() {
        int n = 7, remainder = 16 % n, x, output;
        do {
            x = Rand16();
            output = x % n;
        } while (x >= 16 - remainder);
        return output;
    }

    // adapted to fit the constraints of this example
    static int Rand7RejectionJava() {
        int n = 7, x, output;
        do {
            x = Rand16();
            output = x % n;
        } while (x - output + 6 > 15);
        return output;
    }

    static void Test(Func<int> rand, string name) {
        var buckets = new int[7];
        for (int i = 0; i < 10000000; i++) buckets[rand()]++;
        Console.WriteLine(name);
        for (int i = 0; i < 7; i++) Console.WriteLine("{0}\t{1}", i, buckets[i]);
    }

    static void Main() {
        Test(Rand7Naive, "Rand7Naive");
        Test(Rand7Float, "Rand7Float");
        Test(Rand7RejectionNaive, "Rand7RejectionNaive");
    }
}

The result is as follows (pasted into Excel and added conditional coloring of cells so that differences are more apparent):

enter image description here

Now that I fixed my mistake in above rejection sampling it works as it should (before it would bias 0). As you can see, the float method isn't perfect at all, it just distributes the biased numbers differently.

answered Nov 06 '22 04:11

Joey

Related questions
                            
                                Why does make_optional decay its argument type?
                            
                                How to enable C++11 in CLion?
                            
                                How is std::map implemented so it can require its key_type to be comparable?
                            
                                CUDA compilation issue with CMake
                            
                                Using nested vectors vs a flatten vector wrapper, strange behaviour
                            
                                The CXX compiler identification is unknown: xcode
                            
                                Connecting three different objects
                            
                                How to convert std::queue to std::vector
                            
                                Missing vc_runtimeminimum_x86.msi and installation won't work
                            
                                Are const and & ignored in std::function?
                            
                                delete a NULL pointer does not call overloaded delete when destructor is written
                            
                                Checking the sizeof an integer type in the preprocessor
                            
                                how to print out all elements in a std::stack or std::queue conveniently
                            
                                aggregate 'std::stringstream out' has incomplete type and cannot be defined [C++] [closed]
                            
                                Detect operator support with decltype/SFINAE
                            
                                Not able to compile C/C++ code that's using ncurses [duplicate]
                            
                                C++ - must friend functions be defined in the header file?
                            
                                Double (two-dimensional) array using std::unique_ptr
                            
                                How to change text and background color?
                            
                                Why do functions/objects inside anonymous namespace have external linkage?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the optimal algorithm for generating an unbiased random integer within a range?

Tags:

c++

c

random

uniform

Desmond Hume

People also ask

1 Answers

Joey

Recent Activity

Donate For Us