Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating unique non guessable base 36 id

For an application similar to URL shortener services, I want to create non guessable id's, which you're all familiar with I presume. Here's an example of such an id:

http://example.com/sd23t9

What is a good and efficient technique to generate these, with minimal (or no) risk of collision when inserting these as a primary key in a database table?

EDIT:
Piskvor makes an excellent point of course. I should have mentioned that I meant minimal collision risk before the 36^6 limit is met.

EDIT 2
Eh, scrap that, his point was illustrating much more than that of course. Hmmm. Pregenerating a table with id's then, perhaps (like I've read elsewhere already)? Would that be the most efficient technique when i'm bound to the 36^6 and none-consecutive constraints perhaps?

like image 755
Decent Dabbler Avatar asked Dec 17 '22 18:12

Decent Dabbler


2 Answers

Set ID length. // e.g. 6
do {
  Generate a short random ID of the given length
  Collision?
   - No:
      DONE!
   - Yes:
      increase ID length
 } while true

For any finite ID length, there's always a risk of collision: assuming from your example that you'd have [a-z0-9]{6} IDs, as soon as you have 2,176,782,336 IDs, a collision is 100% guaranteed (no more available keys). Due to birthday problem, you'll get collisions much, much sooner. With such a small keyspace, there isn't a way to avoid collisions - you'll need to have some sort of collision recovery instead.

You could generate an ID until it doesn't collide - but this will get progressively slower as the keyspace is being filled: imagine a keyspace of [a-z], with [a-n] and [p-z] already taken - now every new random ID is more likely to collide than not; and when you completely fill the keyspace, the loop will not terminate at all.

The algorithm I propose is maybe overcautious in this regard: if it finds a collision, it will try progressively longer IDs (as it assumes "collision => not feasible to check the shorter keyspace"). Although it's not efficient, it's likely to find a non-colliding ID within a few iterations.

like image 61
Piskvor left the building Avatar answered Dec 19 '22 08:12

Piskvor left the building


A bit of a weird idea. Why not to use permutations? e.g. you have a set of values [0-9a-z] when you generate first id. you take a first permutation in lexicographical order. then second and so on. to make it appear less guessable you can change the rules of lexicographic order. say 'a' goes after 't' or something like that. you can use tuple instead of a full permutations as well. This will ensure there is no collisions.

What this idea is actually about is making some sort of a two way hash function. basically if you are able to encode number "1" in some way to get something like "q8d3dw" and be able to decode "q8d3dw" back to "1" you can be certain that this function will give you unique strings for all the values from 1 to 36^6.

The problem is actually selecting this function. The easy way would be to associate "1" to "000000", "2" to "000001", "12" to "00000b". Basically arrange all available strings in lexicographical order and select the string on a position which is equal to the id. This however is really easy to guess. So what you could do is to artificially change rules of lexicographic order. Say instead of having a normal order(0,1,2,3...a,b,c...x,y,z) you could shuffle it a bit and get something like (a,5,t,3...). This will produce a bit more obfuscated results. However it will still be quite guessable because the first element would be "aaaaaa", second "aaaaa5", then "aaaaat". So you can change rules of lexicographic order even further, making them depended on the position of the character. Say order for the first char id (a,5,t,3...) for the second one (y,7,3,r...), etc.

Now, i will not post any pseudo code as it will be quite a long one. And I'm not advising you to go with that route unless you are interested in creating this sort of algorithms for fun :). However if you will go with this route it could be a very efficient way of generating this id's with no chance of collision. And I'll advise you to read volume 4 of "Art of computer programming" by Dr Donald Knuth. There are lots of suggestions in implementation of such algorithms.

like image 24
Ivan Avatar answered Dec 19 '22 06:12

Ivan