I'm working on an application where I need to generate unique, non-sequential IDs. One of the constraints I have is that they must consist of 3 digits followed by 2 letters (only about 600k IDs). Given my relatively small pool of IDs I was considering simply generating all possible IDs, shuffling them and putting them into a database. Since, internally, I'll have a simple, sequential, ID to use, it'll be easy to pluck them out one at a time & be sure I don't have any repeats.
This doesn't feel like a very satisfying solution. Does anyone out there have a more interesting method of generating unique IDs from a limited pool than this 'lottery' method?
This can be done a lot of different ways, depending on what you are trying to optimize (speed, memory usage, etc.).
ID pattern = ddd c1c[0]
Option 1 (essentially like hashing, similar to Zak's):
1 Generate a random number between 0 and number of possibilities (676k).
2- Convert number to combination
ddd = random / (26^2)
c[0] = random % (26)
c[1] = (random / 26) % 26
3- Query DB for existence of ID and increment until a free one is found.
Option 2 (Linear feedback shift register, see wikipedia):
1- Seed with a random number in range (0,676k). (See below why you can't seed with '0')
2- Generate subsequent random numbers by applying the following to the current ID number
num = (num >> 1) ^ (-(num & 1u) & 0x90000u);
3- Skip IDs larger than range (ie 0xA50A0+)
4- Convert number into ID format (as above)
*You will need to save the last number generated that was used for an ID, but you won't need to query the DB to see if it is used. This solution will enumerate all possible IDs except [000 AA] due to the way the LFSR works.
[edit] Since your range is actually larger than you need, you can get back [000 AA] by subtracting 1 before you convert to the ID and have your valid range be (0,0xA50A0]
Use a finite group. Basically, take a 32 or 64-bit integer, and find a large number that is coprime to the maximum value for your integer; call this number M. Then, for all integers n, n * M will result in a unique number that has lots of digits.
This has the advantage that you don't need to pre-fill the database, or run a separate select query -- you can do this all from within one insert statement, by having your n
just be an auto-increment, and have a separate ID column that defaults to the n * M.
You could generate a random ID conforming to that standard, do a DB select to see if it exists already, then insert it into a DB to note it has been "used". For the first 25% of the life of that scheme (or about 150k entries), it should be relatively fast to generate new random ID's. After that though, it will take longer and longer, and you might as well pre-fill the table to look for free IDs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With