Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random number generator that returns only one number each time

Does Python have a random number generator that returns only one random integer number each time when next() function is called? Numbers should not repeat and the generator should return random integers in the interval [1, 1 000 000] that are unique.

I need to generate more than million different numbers and that sounds as if it is very memory consuming in case all the number are generated at same time and stored in a list.

like image 550
Primoz Avatar asked Dec 19 '22 07:12

Primoz


1 Answers

You are looking for a linear congruential generator with a full period. This will allow you to get a pseudo-random sequence of non-repeating numbers in your target number range.

Implementing a LCG is actually very simple, and looks like this:

def lcg(a, c, m, seed = None):
    num = seed or 0
    while True:
        num = (a * num + c) % m
        yield num

Then, it just comes down to choosing the correct values for a, c, and m to guarantee that the LCG will generate a full period (which is the only guarantee that you get non-repeating numbers). As the Wikipedia article explains, the following three conditions need to be true:

  1. m and c need to be relatively prime.
  2. a - 1 is divisible by all prime factors of m
  3. a - 1 is divisible by 4, if m is also divisible by 4.

The first one is very easily guaranteed by simply choosing a prime for c. Also, this is the value that can be chosen last, and this will ultimately allow us to mix up the sequence a bit.

The relationship between a - 1 and m is more complicated though. In a full period LCG, m is the length of the period. Or in other words, it is the number range your numbers come from. So this is what you are usually choosing first. In your case, you want m to be around 1000000. Choosing exactly your maximum number might be difficult since that restricts you a lot (in both your choice of a and also c), so you can also choose numbers larger than that and simply skip all numbers outside of your range later.

Let’s choose m = 1000000 now though. The prime factors of m are 2 and 5. And it’s also obviously divisible by 4. So for a - 1, we need a number that is a multiple of 2 * 2 * 5 to satisfy the conditions 2 and 3. Let’s choose a - 1 = 160, so a = 161.

For c, we are using a random prime that’s somewhere in between of our range: c = 506903

Putting that into our LCG gives us our desired sequence. We can choose any seed value from the range (0 <= seed <= m) as the starting point of our sequence.

So let’s try it out and verify that what we thought of actually works. For this purpose, we are just collecting all numbers from the generator in a set until we hit a duplicate. At that point, we should have m = 1000000 numbers in the set:

>>> g = lcg(161, 506903, 1000000)
>>> numbers = set()
>>> for n in g:
        if n in numbers:
            raise Exception('Number {} already encountered before!'.format(n))
        numbers.add(n)

Traceback (most recent call last):
  File "<pyshell#5>", line 3, in <module>
    raise Exception('Number {} already encountered before!'.format(n))
Exception: Number 506903 already encountered before!
>>> len(numbers)
1000000

And it’s correct! So we did create a pseudo-random sequence of numbers that allowed us to get non-repeating numbers from our range m. Of course, by design, this sequence will be always the same, so it is only random once when you choose those numbers. You can switch up the values for a and c to get different sequences though, as long as you maintain the properties mentioned above.


The big benefit of this approach is of course that you do not need to store all the previously generated numbers. It is a constant space algorithm as it only needs to remember the initial configuration and the previously generated value.

It will also not deteriorate as you get further into the sequence. This is a general problem with solutions that just keep generating a random number until a new one is found that hasn’t been encountered before. This is because the longer the list of generated numbers gets, the less likely you are going to hit a numbers that’s not in that list with an evenly distributed random algorithm. So getting the 1000000th number will likely take you a long time to generate with memory based random generators.

But of course, having this simply algorithm which just performs some multiplication and some addition does not appear very random. But you have to keep in mind that this is actually the basis for most pseudo-random number generators out there. So random.random() uses something like this internally. It’s just that the m is a lot larger, so you don’t notice it there.

like image 187
poke Avatar answered Dec 21 '22 09:12

poke