Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate a random number from specified discrete distribution?

Lets say we have some discrete distribution with finite number of possible results, is it possible to generate a random number from this distribution faster than in O(logn), where n is number possible results?

How to make it in O(logn):
- Make an array with cumulative probability (Array[i] = Probability that random number will be less or equal to i)
- Generate random number from uniform distribution (lets denote it by k)
- Find the smallest i such that k < Array[i]. It can be done using binary search.
- i is our random number.

like image 360
Tomek Tarczynski Avatar asked Nov 17 '10 17:11

Tomek Tarczynski


1 Answers

Walker's alias method can draw a sample in constant worst-case time, using some auxiliary arrays of size n which need to be precomputed. This method is described in Chapter 3 of Devroye's book on sampling and is implemented in the R sample() function. You can get code from R's source code or this thread. A 1991 paper by Vose claims to reduce the initialization cost.

Note that your question isn't well-defined unless you specify the exact form of the input and how many random numbers you want to draw. For example, if the input is an array giving the probability of each result, then your algorithm is not O(log n) because it requires first computing the cumulative probabilities which takes O(n) time from the input array.

If you intend to draw many samples then the cost of generating a single sample is not so important. Instead what matters is the total cost to generate m results, and the peak memory required. In this regard, the alias method very good. If you want to generate the samples all at once, use the O(n+m) algorithm posted here and then shuffle the results.

like image 79
Tom Minka Avatar answered Oct 02 '22 20:10

Tom Minka