Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient unique key generation for database entries

I'm currently developing a registration system prototype. It's very simplistic and essentially is just a .NET form which gets written into MongoDB.

What I'm stuck with is an efficient way to generate a unique id/key for each user. These ids must be human friendly so something like a 7 character long alphanumeric string e.g. A1B2C3X.

The solutions I've seen so far just use a simple function to generate a random string and then check the database to see if it is unique (and if not repeat untill you find one that is unique). This of course will become more and more computationally expensive as the number of database entries grows.

My idea is to precompute the unique id set and store that in another database. Then when I need to add a new entry into the user database I can "pop" an id from my id databse (in constant time) and know that it does not already exist in the user database without the need to search it.

I'm sure somebody must have done something like this before. Is there a better way? I don't know why I'm struggling so much with this. Your input is very much appreciated.

like image 838
Luke Ellis Avatar asked Mar 11 '12 12:03

Luke Ellis


1 Answers

Generating a random string in the application and checking if it's unique is not a bad solution. Don't worry about it being inefficient, it's not -- and definitely not compared to the alternatives. It will certainly be faster than running db.user.count() or keeping a separate table with precalculated IDs. You just need to do it right.

First of all, how often will new users be created? Probably not very often compared to other things, so really the whole efficiency discussion is moot. Secondly, with 7 characters A-Z, 0-9 that's a range of 36^7 or somewhere around 78 billion. It will be some time before you will start seeing collisions, to say the least.

If you just do it like this, it will not incur any performance penalties unless there's a collision (which is extremely unlikely):

  • Generate a unique user ID
  • Insert your user object, using the user ID as the value of _id
  • Check for duplicate key errors (how to do this depends on the language and driver, but might involve running the getLastError command).
  • On a duplicate key error start over by generating a new user ID

This way there will only be extra work in the event of a collision (and I really, really want to stress how incredibly unlikely that will be).

There's another way of generating a unique user ID: take the current UNIX timestamp (down to the second), append a hash of the hostname and then the process ID, and finally the current value of a counter. This is in fact how Mongo's ObjectId is generated, and guarantees that you can generate as many objects per second, per process, as the max value of your counter (which in Mongo is 3 bytes, so 16 million). See the docs on ObjectId if you're interested in the details: http://www.mongodb.org/display/DOCS/Object+IDs

It has the property that your user IDs will naturally sort in order of creation, but it's 12 bytes long, so a bit longer than your 7 chars, unfortunately. You can use the same method and skip the hostname/pid, and shorten the counter (which can also be a random number if you like) to two bytes, then you would be down to 6 bytes, which could probably be squeezed into about 9 chars A-Z, 0-9.

like image 178
Theo Avatar answered Oct 29 '22 01:10

Theo