Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating sorted random ints without the sort? O(n)

Just been looking at a code golf question about generating a sorted list of 100 random integers. What popped into my head, however, was the idea that you could generate instead a list of positive deltas, and just keep adding them to a running total, thus:

deltas: 1 3 2  7  2
ints:   1 4 6 13 15

In fact, you would use floats, then normalise to fit some upper limit, and round, but the effect is the same.

Although it wouldn't make for shorter code, it would certainly be faster without the sort step. But the thing I have no real handle on is this: Would the resulting distribution of integers be the same as generating 100 random integers from a uniformly distributed probability density function?

Edit: A sample script:

import random,sys
running = 0
max = 1000
deltas = [random.random() for i in range(0,11)]
floats = []
for d in deltas:
    running += d
    floats.append(running)
upper = floats.pop()
ints = [int(round(f/upper*max)) for f in floats]
print(ints)

Whose output (fair dice roll) was:

[24, 71, 133, 261, 308, 347, 499, 543, 722, 852]

UPDATE: Alok's answer and Dan Dyer's comment point out that using an exponential distribution for the deltas would give a uniform distribution of integers.

like image 327
Phil H Avatar asked Dec 08 '09 10:12

Phil H


4 Answers

So you are asking if the numbers generated in this way are going to be uniformly distributed.

You are generating a series:

yj = ∑i=0j ( xi / A )

where A is the sum of all xi. xi is the list of (positive) deltas.

This can be done iff xi are exponentially distributed (with any fixed mean). So, if xi are uniformly distributed, the resulting yj will not be uniformly distributed.

Having said that, it's fairly easy to generate exponential xi values.

One example would be:

sum := 0
for I = 1 to N do:
    X[I] = sum = sum - ln(RAND)
sum = sum - ln(RAND)
for I = 1 to N do:
    X[I] = X[I]/sum

and you will have your random numbers sorted in the range [0, 1).

Reference: Generating Sorted Lists of Random Numbers. The paper has other (faster) algorithms as well.

Of course, this generates floating-point numbers. For uniform distribution of integers, you can replace sum above by sum/RANGE in the last step (i.e., the R.H.S becomes X[I]*RANGE/sum, and then round the numbers to the nearest integer).

like image 114
Alok Singhal Avatar answered Nov 15 '22 23:11

Alok Singhal


A uniform distribution has an upper and a lower bound. If you use your proposed method, and your deltas happen to be chosen large enough that you run into the upper bound before you have generated all your numbers, what would your algorithm do next?

Having said that, you may want to investigate the Poisson distribution, which is the distribution of interval times between random events occurring with a given average frequency.

like image 33
Greg Hewgill Avatar answered Nov 16 '22 01:11

Greg Hewgill


If you take the number range of being 1 to 1000, and you have to use 100 of these numbers, the delta will have to be as a minimum 10, otherwise you can not reach the 1000 mark. How about some working to demonstrate it in action...

The chance of any given number in an evenly distributed random selection is 100/1000 e.g. 1/10 - no shock there, take that as the basis.

Assuming you start using a delta and that delta is just 10.

The odds of getting the number 1 is 1/10 - seems fine. The odds of getting the number 2 is 1/10 + (1/10 * 1/10) (because you could hit 2 deltas of 1 in a row, or just hit a 2 as the first delta.) The odds of getting the number 3 is 1/10 + (1/10 * 1/10 * 1/10) + (1/10 * 1/10) + (1/10 * 1/10)

The first case was a delta of 3, the second was hitting 3 deltas of 1 in a row, the third case would be a delta of 1 followed by a 2, and the fourth case was a delta of 2 followed by a 1.

For the sake of my fingers typing, we won't generate the combinations that hit 5.

Immediately the first few numbers have a greater percentage chance than the straight random.

This could be altered by changing the delta value so the fractions are all different, but I do not believe you could find a delta that produced identical odds.

To give an analogy that might just sink it, if you consider your delta as just 6 and you run that twice it is the equivalent of throwing 2 dice - each of the deltas is independant, but you know that 7 has a higher chance of being selected than 2.

like image 40
Andrew Avatar answered Nov 16 '22 00:11

Andrew


I think it will be extremely similar but the extremes will be different because of the normalization. For example, 100 numbers chosen at random between 1 and 100 could all be 1. However, 100 numbers created using your system could all have deltas of 0.01 but when you normalize them you'll scale them up to be in the range 1 -> 100 which will mean you'll never get that strange possibility of a set of very low numbers.

like image 37
Benj Avatar answered Nov 16 '22 01:11

Benj