I encountered a naive algorithm for random number generation that produce a series of numbers as follows:
for (int i = 0; i < MAX; i++)
if (rand.nextInt(100) >= 100 - probability) // probability is between 0 and 100
randomNumbersList.add(i);
I was wondering if there's a way to achieve statistically equivalent results without iterating through each number between 0 and MAX.
Let's denote p=probability/100
and q=1-p
.
Consider what will be the first number to be added. With probability q
it is 0; with probability (1-q)*q
it is 1, with probability (1-q)^2*q
it is 2 and so on. This is the geometric distribution. You can easily generate a random number distributed according to geometric distribution using the following approach: generate a random number u
uniformly distributed in [0,1] and calculate x=⌊ln(u)/ln(q)⌋
— this x
will have geometric distribution (see this question).
So this is how you can calculate the first number to add.
Now consider the difference between the second and the first numbers. It will also be distributed geometrically (only starting at 1, not at 0), so you can calculate this difference the same way and thus obtain the second number, and so on.
A pseudocode will be something like
cur = -1
lnq = ln(q)
while true
u = random(0,1) // float!
cur = cur + 1 + floor(ln(u)/lnq)
if cur >= MAX
break
randomNumbersList.add(cur);
Corresponding Java code by @traveh
List<Integer> randomNumbersList = new LinkedList<Integer>();
int cur = -1;
double p = probability / 100;
double q = 1 - p;
double lnq = Math.log(q);
Random random = new Random();
while (true) {
double u = random.nextDouble();
cur = cur + 1 + (int)Math.floor(Math.log(u) / lnq);
if (cur >= MAX)
break;
randomNumbersList.add(cur);
}
Your algorithm creates a list of up to MAX elements. Each element is an integer from 0 to MAX-1, with no duplicates. Since rand.nextInt(n)
returns numbers x uniformly distributed between 0 and n such that 0 <= x < n, x >= 100-p
should be always false if p == 0 (x is never 100), and always true if p == 100 (x is always >= 0). The expected number of elements is therefore MAX*(p/100.0).
This can be improved significantly if MAX
is high but p
is low: in most cases, you would be throwing your weighted coin, but it would come up tails and you would not add anything. Wasted work. If, however, p
is high (say, over .5), then you will typically be generating on-the-order-of MAX elements; and it is unlikely that you can make things much faster (you should expect to do O(MAX) work to create O(MAX) random elements). If MAX
is small, there's little difference between approaches - so I would stick to the simpler one: the one you already have.
Assuming large MAX
and small p
We can model the length of the list using the well-known binomial distribution (as you are throwing MAX non-fair coins, which land "heads" with probability p
). Java code is available in the Colt library. Using their classes, this should work:
Binomial b = new Binomial(MAX, p, new MersenneTwister());
int heads = b.nextInt();
And now we need to generate "heads" sorted integers between 0 and MAX-1. Let us assume that MAX is much larger than heads. We can use
TreeSet<Integer> chosen = new TreeSet<>();
for (int i=0, r=0; i<heads; i++) {
do { r = random.nextInt(MAX) } while (chosen.contains(r));
chosen.add(r);
}
Note that this has horrible performance when p
is high, because the inner loop will be executed several times; but for such cases, your initial algorithm is already good enough.
When p
is low, the proposed algorithm will require time proportional to MAX*(p/100) instead of MAX. This should more than compensate for the cost of maintaining the TreeSet
ordered.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With