I am trying to come up with a "smart" and "secure" way of generating about 63million unique codes to be used in a competition. The codes will be about 10 characters long.
Has anyone done anything similar or aware of any "hurdles" that might arise from this issues? How do we minimize the probability of someone being able to guess the codes?
This project will be done in PHP, but that won't really matter, it's more of the logic that's an issue here.
Any feedback would be really appreciated.
UPDATE Just to clarify it will be 10 characters of case insensitive Alpha Numeric Values. A-Z0-9
Syntax:
You probably will have people copying these codes, so that means these codes should be easy to copy. 10^10 is too small, as Gamecat points out. kgiannakakis has a better idea, but that causes another problem: "1" looks a lot like "I". "0", "C", "O" and "Q" are also quite similar. This is not a big problem. Define a safe alfabet: "0123456789ABDEFGHJKLMNPRSTUVXYZ" (leaves out COIQ) From the comments: depending on the fonts you pick, 5/S and U/V may also be visually ambiguous; replace as required. This is a 32 symbol (5 bit) code. A 10 character code is a 50 bits number. Those should be fairly trivial to generate, sort, copy, compare etc. Chances of being guessed are about 0.63E-7
Since the codes are too long to remember, users will need a resting point when copying them. So split the string in two or three parts and make sure the input field matches this breakdown.
E.g. AKG3L-45TEE => two groups of 5, and even if you can't remember 5 chars it's a lot easier to find the point back where you stopped reading.
How to generate them:
This is fairly simple. You don't need a particularly sophistciated algorithm to generate candidates. You can generate 10 random numbers per code needed, take 5 bits from each number (typically the middle bits are best, e.g. (rand()/64) modulo 32). Use this value [0-31] as index into your alphabet. Create a database table with this string as a primary key, and insert until the table has 63 million entries. You probably want to add "generated on" and "redeemed on" dates to this table.
If I understand you correctly, you want to create 63 milion codes of 10 digits that have a low "guess factor".
There are 10,000,000,000 valid combinations. Of these 63,000,000 are price numbers. 63 / 10,000 = 0.0063. So each guess has a 0,63% chance of success. Does not sound high, but with brute force, the numbers are quite easy to obtain.
Are you sure the 63 on 10,000 ratio is good enough?
Generate a set of truly random, unique 64-bit numbers over the range 0 - 250-1. You'll need to keep track of the ones you have seen and reject duplicates. Use each 5 bits of the lower 50 bits of this number pull from a 32-character alphabet -- basically all the letters in the English alphabet (upper or lowercase) minus L and O plus the digits 2-9 (this reduces confusion between l/1 and 0/O). For 63 million codes, this would give you a 0.000006% probability (63,000,000/250) of choosing a valid code sequence randomly.
I've also done this using an autogenerated, primary key (int) and bit-interleaving it with a 32-bit random value. In this case I used the full 64-bits to generate 13 characters from the alphabet and added two random characters at fixed positions for a 15-character code. When redeeming the code you reverse the algorithm to extract the key and the randomness, throwing away the two extra random characters, then compare the randomness to that found stored with the key to validate the code.
Be careful when using alphanumerics for codes, as you don't want to accidentally generate anything confusing or embarrassing. To avoid confusion I suggest removing 1 and L, 0 and O, and maybe 8 and B. To avoid embarrassment consider removing all vowels so that you cannot accidentally spell anything (use your imagination here).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With