I'm working with a client that needs to generate millions of the alphanumeric codes used in magazine scratch-off cards, bottlecap prizes, and so on. They have to be short enough to print on a cap, they want to make sure that ambiguous characters like 1 and I, 0 and O, etc. are not included, and they have to be explicitly stored for future use -- we can't just have an algorithm that determines 'validity' when someone tries to redeem one. Finally, they want to make sure that the codes are randomly distributed inside of a large "code space" so that people can't just guess additional codes by walking through the alphabet.
Are there any pointers towards reasonably efficient algorithms for generating these kinds of code sets? I've scratched a few out on the back of an envelope, but this problem smells like a trap for the unwary.
The uniqid() function generates a unique ID based on the microtime (the current time in microseconds). Note: The generated ID from this function does not guarantee uniqueness of the return value! To generate an extremely difficult to predict ID, use the md5() function.
This function in MySQL is used to return a Universal Unique Identifier (UUID) generated according to RFC 4122, “A Universally Unique Identifier (UUID) URN Namespace”. It is designed as a number that is universally unique. Two UUID values are expected to be distinct, even they are generated on two independent servers.
If you need about 10 million unique keys (for example), the best approach is to pick a key-space that's exponentially bigger, and start randomly generating. Read about the Birthday Paradox -- it's the main thing you should be worried about. If you want 2^n unique and secure keys, make sure there are at least 2^(2 * n) possible values. Here's a rough O(n log n) algorithm:
Pseudocode:
$last = null;
while ($current = getnext()) {
if ($last == $current) {
push($toDelete, $current);
}
$last = $current;
}
Let's suppose you can use a character set of, say, 40 symbols of unambiguous upper,lower and numeric characters.
For a sequence of n chars, you've got 40n combinations
Thus 8 chars gives a pretty good space to work in - if you generated 10 million codes, you'd have to try hundreds of thousands of combinations to brute force a code.
Or you come at from the other direction - give the number of possible codes, how many codes should you generate to avoid the trap they call the Birthday Paradox?
Taking the 8 char code, 6,553,600,000,000 is approx 242, thus you might reasonably generate 221 codes from it, or 2,097,152
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With