Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random number generation algorithm

I encountered a naive algorithm for random number generation that produce a series of numbers as follows:

for (int i = 0; i < MAX; i++)
   if (rand.nextInt(100) >= 100 - probability) // probability is between 0 and 100
       randomNumbersList.add(i);

I was wondering if there's a way to achieve statistically equivalent results without iterating through each number between 0 and MAX.

like image 217
traveh Avatar asked Dec 25 '22 17:12

traveh


2 Answers

Let's denote p=probability/100 and q=1-p.

Consider what will be the first number to be added. With probability q it is 0; with probability (1-q)*q it is 1, with probability (1-q)^2*q it is 2 and so on. This is the geometric distribution. You can easily generate a random number distributed according to geometric distribution using the following approach: generate a random number u uniformly distributed in [0,1] and calculate x=⌊ln(u)/ln(q)⌋ — this x will have geometric distribution (see this question).

So this is how you can calculate the first number to add.

Now consider the difference between the second and the first numbers. It will also be distributed geometrically (only starting at 1, not at 0), so you can calculate this difference the same way and thus obtain the second number, and so on.

A pseudocode will be something like

cur = -1
lnq = ln(q)
while true
    u = random(0,1)   // float!
    cur = cur + 1 + floor(ln(u)/lnq)
    if cur >= MAX
        break
    randomNumbersList.add(cur);

Corresponding Java code by @traveh

List<Integer> randomNumbersList = new LinkedList<Integer>();
int cur = -1;
double p = probability / 100;
double q = 1 - p;
double lnq = Math.log(q);
Random random = new Random();
while (true) {
    double u = random.nextDouble();
    cur = cur + 1 + (int)Math.floor(Math.log(u) / lnq);
    if (cur >= MAX)
        break;
    randomNumbersList.add(cur);
}
like image 123
Petr Avatar answered Jan 07 '23 09:01

Petr


Your algorithm creates a list of up to MAX elements. Each element is an integer from 0 to MAX-1, with no duplicates. Since rand.nextInt(n) returns numbers x uniformly distributed between 0 and n such that 0 <= x < n, x >= 100-p should be always false if p == 0 (x is never 100), and always true if p == 100 (x is always >= 0). The expected number of elements is therefore MAX*(p/100.0).

This can be improved significantly if MAX is high but p is low: in most cases, you would be throwing your weighted coin, but it would come up tails and you would not add anything. Wasted work. If, however, p is high (say, over .5), then you will typically be generating on-the-order-of MAX elements; and it is unlikely that you can make things much faster (you should expect to do O(MAX) work to create O(MAX) random elements). If MAX is small, there's little difference between approaches - so I would stick to the simpler one: the one you already have.

Assuming large MAX and small p

We can model the length of the list using the well-known binomial distribution (as you are throwing MAX non-fair coins, which land "heads" with probability p). Java code is available in the Colt library. Using their classes, this should work:

 Binomial b = new Binomial(MAX, p, new MersenneTwister());
 int heads = b.nextInt();

And now we need to generate "heads" sorted integers between 0 and MAX-1. Let us assume that MAX is much larger than heads. We can use

 TreeSet<Integer> chosen = new TreeSet<>();
 for (int i=0, r=0; i<heads; i++) {
     do { r = random.nextInt(MAX) } while (chosen.contains(r));
     chosen.add(r);
 }

Note that this has horrible performance when p is high, because the inner loop will be executed several times; but for such cases, your initial algorithm is already good enough.

When p is low, the proposed algorithm will require time proportional to MAX*(p/100) instead of MAX. This should more than compensate for the cost of maintaining the TreeSet ordered.

like image 23
tucuxi Avatar answered Jan 07 '23 09:01

tucuxi