Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On average, how many times will this incorrect loop iterate?

In some cases, a loop needs to run for a random number of iterations that ranges from min to max, inclusive. One working solution is to do something like this:

int numIterations = randomInteger(min, max);
for (int i = 0; i < numIterations; i++) {
   /* ... fun and exciting things! ... */
}

A common mistake that many beginning programmers make is to do this:

for (int i = 0; i < randomInteger(min, max); i++) {
   /* ... fun and exciting things! ... */
}

This recomputes the loop upper bound on each iteration.

I suspect that this does not give a uniform distribution of the number of times the loop will iterate that ranges from min to max, but I'm not sure exactly what distribution you do get when you do something like this. Does anyone know what the distribution of the number of loop iterations will be?

As a specific example: suppose that min = 0 and max = 2. Then there are the following possibilities:

  • When i = 0, the random value is 0. The loop runs 0 times.
  • When i = 0, the random value is nonzero. Then:
    • When i = 1, the random value is 0 or 1. Then the loop runs 1 time.
    • When i = 1, the random value is 2. Then the loop runs 2 times.

The probability of this first event is 1/3. The second event has probability 2/3, and within it, the first subcase has probability 2/3 and the second event has probability 1/3. Therefore, the average number of distributions is

0 × 1/3 + 1 × 2/3 × 2/3 + 2 × 2/3 × 1/3

= 0 + 4/9 + 4/9

= 8/9

Note that if the distribution were indeed uniform, we'd expect to get 1 loop iteration, but now we only get 8/9 on average. My question is whether it's possible to generalize this result to get a more exact value on the number of iterations.

Thanks!

like image 940
templatetypedef Avatar asked Apr 19 '13 19:04

templatetypedef


1 Answers

Final edit (maybe!). I'm 95% sure that this isn't one of the standard distributions that are appropriate. I've put what the distribution is at the bottom of this post, as I think the code that gives the probabilities is more readable! A plot for the mean number of iterations against max is given below.

enter image description here

Interestingly, the number of iterations tails off as you increase max. Would be interesting if someone else could confirm this with their code.

If I were to start modelling this, I would start with the geometric distribution, and try to modify that. Essentially we're looking at a discrete, bounded distribution. So we have zero or more "failures" (not meeting the stopping condition), followed by one "success". The catch here, compared to the geometric or Poisson, is that the probability of success changes (also, like the Poisson, the geometric distribution is unbounded, but I think structurally the geometric is a good base). Assuming min=0, the basic mathematical form for P(X=k), 0 <= k <= max, where k is the number of iterations the loop runs, is, like the geometric distribution, the product of k failure terms and 1 success term, corresponding to k "false"s on the loop condition and 1 "true". (Note that this holds even to calculate the last probability, as the chance of stopping is then 1, which obviously makes no difference to a product).

Following on from this, an attempt to implement this in code, in R, looks like this:

fx = function(k,maximum)
{
    n=maximum+1;
    failure = factorial(n-1)/factorial(n-1-k) / n^k;
    success = (k+1) / n;
    failure * success
}

This assumes min=0, but generalizing to arbitrary mins isn't difficult (see my comment on the OP). To explain the code. First, as shown by the OP, the probabilities all have (min+1) as a denominator, so we calculate the denominator, n. Next, we calculate the product of the failure terms. Here factorial(n-1)/factorial(n-1-k) means, for example, for min=2, n=3 and k=2: 2*1. And it generalises to give you (n-1)(n-2)... for the total probability of failure. The probability of success increases as you get further into the loop, until finally, when k=maximum, it is 1.

Plotting this analytic formula gives the same results as the OP, and the same shape as the simulation plotted by John Kugelman.

enter image description here

Incidentally the R code to do this is as follows

plot_probability_mass_function = function(maximum)
{
    x=0:maximum;
    barplot(fx(x,max(x)), names.arg=x, main=paste("max",maximum), ylab="P(X=x)");
}

par(mfrow=c(3,1))
plot_probability_mass_function(2)
plot_probability_mass_function(10)
plot_probability_mass_function(100)

Mathematically, the distribution is, if I've got my maths right, given by:

enter image description here

which simplifies to

enter image description here

(thanks a bunch to http://www.codecogs.com/latex/eqneditor.php)

The latter is given by the R function

function(x,m) { factorial(m)*(x+1)/(factorial(m-x)*(m+1)^(x+1)) }

Plotting the mean number of iterations is done like this in R

meanf = function(minimum)
{
    x = 0:minimum
    probs = f(x,minimum)
    x %*% probs
}

meanf = function(maximum)
{
    x = 0:maximum
    probs = f(x,maximum)
    x %*% probs
}

par(mfrow=c(2,1))
max_range = 1:10
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
max_range = 1:100
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
like image 67
TooTone Avatar answered Nov 16 '22 00:11

TooTone