In some cases, a loop needs to run for a random number of iterations that ranges from min
to max
, inclusive. One working solution is to do something like this:
int numIterations = randomInteger(min, max);
for (int i = 0; i < numIterations; i++) {
/* ... fun and exciting things! ... */
}
A common mistake that many beginning programmers make is to do this:
for (int i = 0; i < randomInteger(min, max); i++) {
/* ... fun and exciting things! ... */
}
This recomputes the loop upper bound on each iteration.
I suspect that this does not give a uniform distribution of the number of times the loop will iterate that ranges from min
to max
, but I'm not sure exactly what distribution you do get when you do something like this. Does anyone know what the distribution of the number of loop iterations will be?
As a specific example: suppose that min
= 0 and max
= 2. Then there are the following possibilities:
i = 0
, the random value is 0. The loop runs 0 times.i = 0
, the random value is nonzero. Then:
i = 1
, the random value is 0 or 1. Then the loop runs 1 time.i = 1
, the random value is 2. Then the loop runs 2 times.The probability of this first event is 1/3. The second event has probability 2/3, and within it, the first subcase has probability 2/3 and the second event has probability 1/3. Therefore, the average number of distributions is
0 × 1/3 + 1 × 2/3 × 2/3 + 2 × 2/3 × 1/3
= 0 + 4/9 + 4/9
= 8/9
Note that if the distribution were indeed uniform, we'd expect to get 1 loop iteration, but now we only get 8/9 on average. My question is whether it's possible to generalize this result to get a more exact value on the number of iterations.
Thanks!
Final edit (maybe!). I'm 95% sure that this isn't one of the standard distributions that are appropriate. I've put what the distribution is at the bottom of this post, as I think the code that gives the probabilities is more readable! A plot for the mean number of iterations against max
is given below.
Interestingly, the number of iterations tails off as you increase max. Would be interesting if someone else could confirm this with their code.
If I were to start modelling this, I would start with the geometric distribution, and try to modify that. Essentially we're looking at a discrete, bounded distribution. So we have zero or more "failures" (not meeting the stopping condition), followed by one "success". The catch here, compared to the geometric or Poisson, is that the probability of success changes (also, like the Poisson, the geometric distribution is unbounded, but I think structurally the geometric is a good base). Assuming min=0, the basic mathematical form for P(X=k), 0 <= k <= max, where k is the number of iterations the loop runs, is, like the geometric distribution, the product of k failure terms and 1 success term, corresponding to k "false"s on the loop condition and 1 "true". (Note that this holds even to calculate the last probability, as the chance of stopping is then 1, which obviously makes no difference to a product).
Following on from this, an attempt to implement this in code, in R, looks like this:
fx = function(k,maximum)
{
n=maximum+1;
failure = factorial(n-1)/factorial(n-1-k) / n^k;
success = (k+1) / n;
failure * success
}
This assumes min=0, but generalizing to arbitrary min
s isn't difficult (see my comment on the OP). To explain the code. First, as shown by the OP, the probabilities all have (min+1)
as a denominator, so we calculate the denominator, n
. Next, we calculate the product of the failure terms. Here factorial(n-1)/factorial(n-1-k)
means, for example, for min=2, n=3 and k=2: 2*1. And it generalises to give you (n-1)(n-2)... for the total probability of failure. The probability of success increases as you get further into the loop, until finally, when k=maximum
, it is 1.
Plotting this analytic formula gives the same results as the OP, and the same shape as the simulation plotted by John Kugelman.
Incidentally the R code to do this is as follows
plot_probability_mass_function = function(maximum)
{
x=0:maximum;
barplot(fx(x,max(x)), names.arg=x, main=paste("max",maximum), ylab="P(X=x)");
}
par(mfrow=c(3,1))
plot_probability_mass_function(2)
plot_probability_mass_function(10)
plot_probability_mass_function(100)
Mathematically, the distribution is, if I've got my maths right, given by:
which simplifies to
(thanks a bunch to http://www.codecogs.com/latex/eqneditor.php)
The latter is given by the R function
function(x,m) { factorial(m)*(x+1)/(factorial(m-x)*(m+1)^(x+1)) }
Plotting the mean number of iterations is done like this in R
meanf = function(minimum)
{
x = 0:minimum
probs = f(x,minimum)
x %*% probs
}
meanf = function(maximum)
{
x = 0:maximum
probs = f(x,maximum)
x %*% probs
}
par(mfrow=c(2,1))
max_range = 1:10
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
max_range = 1:100
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With