I would like to generate a short, unique ID without having to check for collisions.
I currently do something like this, but the ID I currently generate is random and checking for collisions in a loop is annoying and will get expensive if the number of records grows significantly.
Normally worrying about collisions isn't an issue, but the unique ID I want to generate is a short unique string 5-8 characters, alpha-numeric, like tinyurl does.
EDIT: I would like to start out with 5 characters and if I hit 60 million entries, then go to 6.. so on and so forth.
To this end, I was thinking I could use an auto_increment value that is hidden from the users, and present them instead with an MD5
or some other method to generate a unique string from that.
Generated strings should not appear to be linear, so simply converting the auto_incremented ID into base 36
[0-9A-Z] is a bit too simplistic, but a function something like that is where I'm going with this.
EDIT: Security is not an issue as this will not be used to secure information. It is simply a shortcut to a longer string. Thank you.
Thank you for your suggestions and sorry for the delay. Dentist..
You'll need something that's correct by construction, i.e. a permutation function: this is a function that does a one-to-one, reversible mapping of one integer (your sequential counter) to another. Some examples (any combination of these should also work):
Example: this function will convert 0, 1, 2, 3, 5, .. into 13, 4, 12, 7, 15, .. for numbers up to 15:
$i=($input+97) & 0xf;
$result=((($i&0x1) << 3) + (($i&0xe) >> 1)) ^ 0x5;
EDIT
An easier way would to use a linear congruential generator (LCG, which is usually used for generating random numbers), which is defined by a formula of the form:
X_n+1 = (a * X_n + c) mod m
For good values of a, c and m, the sequence of X_0, X_1 .. X_m-1 will contain all numbers between 0 and m-1 exactly once. Now you can start from a linearly increasing index, and use the next value in the LCG sequence as your "secret" key.
EDIT2
Implementation: You can design your own LCG parameters, but if you get it wrong it won't cover the full range (and thus have duplicates) so I'll use a published and tried set of parameters here from this paper:
a = 16807, c = 0, m = 2147483647
This gives you a range of 2**31. With pack() you can get the resulting integer as a string, base64_encode() makes it a readable string (of up to 6 significant characters, 6 bits per byte) so this could be your function:
substr(base64_encode(pack("l", (16807 * $index) % 2147483647)), 0, 6)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With