Could anyone recommend a preferred algorithm to use for URL shortening? I'm coding using PHP. Initially I thought about writing something that would start at a character such as "a" and iterate through requests, creating records in a database and therefore having to increment the character to b, c, d ... A, B and so on as appropriate.
However it dawned on me that this algorithm could be pretty heavy/clumsy and there could be a better way to do it.
I read around a bit on Google and some people seem to be doing it with base conversion from the database's ID column. This isn't something I'm too familiar with.
Could someone elaborate and explain to me how this would work? A couple of code examples would be great, too.
I obviously don't want a complete solution as I would like to learn by doing it myself, but just an explanation/pseudo-code on how this would work would be excellent.
To generate a unique short URL, we can compute it using the Unique Hash(MD5, SHA256, etc.) of the original URL and then encode using base62. If we use the MD5 algorithm as our hash function, it'll produce a 128-bit hash value. After base62 encoding, we'll get a string having more than seven characters.
One Simple Solution could be Hashing. Use a hash function to convert long string to short string. In hashing, that may be collisions (2 long URLs map to same short URL) and we need a unique short URL for every long URL so that we can access long URL back.
URL shorteners work by creating a redirect to your long URL. Entering a URL into your internet browser sends an HTTP request to the web server to pull up a specific website. The long and the short URLs are both simply different starting points for an internet browser to get the same destination.
Most shortening services just use a counter that is incremented with every entry and convert the base from 10 to 64.
An implementation in PHP could look like this:
function encode($number) {
return strtr(rtrim(base64_encode(pack('i', $number)), '='), '+/', '-_');
}
function decode($base64) {
$number = unpack('i', base64_decode(str_pad(strtr($base64, '-_', '+/'), strlen($base64) % 4, '=')));
return $number[1];
}
$number = mt_rand(0, PHP_INT_MAX);
var_dump(decode(encode($number)) === $number);
The encode
function takes an integer number, converts it into bytes (pack
), encodes it with the Base-64 encoding (base64_encode
), trims the trailing padding =
(rtrim
), and replaces the characters +
and /
by -
and _
respectively (strtr
). The decode
function is the inverse function to encode
and does the exact opposite (except adding trailing padding).
The additional use of strtr
is to translate the original Base-64 alphabet to the URL and filename safe alphabet as +
and /
need to be encoded with the Percentage-encoding.
You can use base_convert function to do a base convertion from 10 to 36 with the database IDs.
<?php
$id = 315;
echo base_convert($id, 10, 36), "\n";
?>
Or you can reuse some of the ideas presented in the comments on the page bellow:
http://php.net/manual/en/function.base-convert.php
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With