Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Does The Google URL Shortener Generate A 5 Digit Hash Without Collisions

How can the Google URL shortener generate a unique hash with five characters without collisions. Seems like there are bound to be collisions, where different urls generate the same hash.

stackoverflow.com => http://goo.gl/LQysz

What's also interesting, is the same URL, generates a completely different hash each time:

stackoverflow.com => http://goo.gl/Dl7sz

So, doing some math, using lower-case characters, upper-case characters, and digits, the total number of combinations are 62^5 = 916,132,832 clearly collisions bound to happen.

How does Google do this?

like image 664
Justin Avatar asked Nov 03 '11 01:11

Justin


2 Answers

They have a hash table with hash to url.

Count the number of rows in that table and encrypt it with a stream cipher then encode with base62.

Using a stream cipher instead of a hash will give you a short pseudo random output that doesn't collide with any previous output so you don't need to check the table.

like image 186
Walker Communications Avatar answered Oct 29 '22 03:10

Walker Communications


They have a database which tracks all previously generated URLs and the longer URL that each of those maps to. Easy to make sure that newly generated URLs don't already exist in that table. A little tricky to scale out (they surely have multiple servers so each one needs to be assigned a bucket of values from which it can give out to users). If they ever reach the point of having generated 916,132,832 URLs, they'll just add another character.

like image 26
Robert Levy Avatar answered Oct 29 '22 04:10

Robert Levy