Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating non-repeating random numbers in Python

Ok this is one of those trickier than it sounds questions so I'm turning to stack overflow because I can't think of a good answer. Here is what I want: I need Python to generate a simple a list of numbers from 0 to 1,000,000,000 in random order to be used for serial numbers (using a random number so that you can't tell how many have been assigned or do timing attacks as easily, i.e. guessing the next one that will come up). These numbers are stored in a database table (indexed) along with the information linked to them. The program generating them doesn't run forever so it can't rely on internal state.

No big deal right? Just generate a list of numbers, shove them into an array and use Python "random.shuffle(big_number_array)" and we're done. Problem is I'd like to avoid having to store a list of numbers (and thus read the file, pop one off the top, save the file and close it). I'd rather generate them on the fly. Problem is that the solutions I can think of have problems:

1) Generate a random number and then check if it has already been used. If it has been used generate a new number, check, repeat as needed until I find an unused one. Problem here is that I may get unlucky and generate a lot of used numbers before getting one that is unused. Possible fix: use a very large pool of numbers to reduce the chances of this (but then I end up with silly long numbers).

2) Generate a random number and then check if it has already been used. If it has been used add or subtract one from the number and check again, keep repeating until I hit an unused number. Problem is this is no longer a random number as I have introduced bias (eventually I will get clumps of numbers and you'd be able to predict the next number with a better chance of success).

3) Generate a random number and then check if it has already been used. If it has been used add or subtract another randomly generated random number and check again, problem is we're back to simply generating random numbers and checking as in solution 1.

4) Suck it up and generate the random list and save it, have a daemon put them into a Queue so there are numbers available (and avoid constantly opening and closing a file, batching it instead).

5) Generate much larger random numbers and hash them (i.e. using MD5) to get a smaller numeric value, we should rarely get collisions, but I end up with larger than needed numbers again.

6) Prepend or append time based information to the random number (i.e. unix timestamp) to reduce chances of a collision, again I get larger numbers than I need.

Anyone have any clever ideas that will reduce the chances of a "collision" (i.e. generating a random number that is already taken) but will also allow me to keep the number "small" (i.e. less than a billion (or a thousand million for your europeans =)).

Answer and why I accepted it:

So I will simply go with 1, and hope it's not an issue, however if it is I will go with the deterministic solution of generating all the numbers and storing them so that there is a guarentee of getting a new random number, and I can use "small" numbers (i.e. 9 digits instead of an MD5/etc.).

like image 394
bigredbob Avatar asked Jan 16 '10 09:01

bigredbob


People also ask

How do you generate random numbers without repeating in Python?

In Python, you can easily generate random numbers without repeating. To generate random numbers without repeating in Python, you can use the random module function choices(). choices() takes a list and the number of random numbers you want to generate.

How do I make Randint not repeat?

pop() to get the next " randint without repetition". This is pretty much how you get a random card without repetition from a deck (which indeed the very name of random.


2 Answers

This is a neat problem, and I've been thinking about it for a while (with solutions similar to Sjoerd's), but in the end, here's what I think:

Use your point 1) and stop worrying.

Assuming real randomness, the probability that a random number has already been chosen before is the count of previously chosen numbers divided by the size of your pool, i.e. the maximal number.

If you say you only need a billion numbers, i.e. nine digits: Treat yourself to 3 more digits, so you have 12-digit serial numbers (that's three groups of four digits – nice and readable).

Even when you're close to having chosen a billion numbers previously, the probability that your new number is already taken is still only 0,1%.

Do step 1 and draw again. You can still check for an "infinite" loop, say don't try more than 1000 times or so, and then fallback to adding 1 (or something else).

You'll win the lottery before that fallback ever gets used.

like image 79
balpha Avatar answered Sep 19 '22 06:09

balpha


You could use Format-Preserving Encryption to encrypt a counter. Your counter just goes from 0 upwards, and the encryption uses a key of your choice to turn it into a seemingly random value of whatever radix and width you want.

Block ciphers normally have a fixed block size of e.g. 64 or 128 bits. But Format-Preserving Encryption allows you to take a standard cipher like AES and make a smaller-width cipher, of whatever radix and width you want (e.g. radix 10, width 9 for the parameters of the question), with an algorithm which is still cryptographically robust.

It is guaranteed to never have collisions (because cryptographic algorithms create a 1:1 mapping). It is also reversible (a 2-way mapping), so you can take the resulting number and get back to the counter value you started with.

AES-FFX is one proposed standard method to achieve this.

I've experimented with some basic Python code for AES-FFX--see Python code here (but note that it doesn't fully comply with the AES-FFX specification). It can e.g. encrypt a counter to a random-looking 7-digit decimal number. E.g.:

0000000   0731134 0000001   6161064 0000002   8899846 0000003   9575678 0000004   3030773 0000005   2748859 0000006   5127539 0000007   1372978 0000008   3830458 0000009   7628602 0000010   6643859 0000011   2563651 0000012   9522955 0000013   9286113 0000014   5543492 0000015   3230955 ...       ... 

For another example in Python, using another non-AES-FFX (I think) method, see this blog post "How to Generate an Account Number" which does FPE using a Feistel cipher. It generates numbers from 0 to 2^32-1.

like image 27
Craig McQueen Avatar answered Sep 22 '22 06:09

Craig McQueen