Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP URL Shortening Algorithm

Could anyone recommend a preferred algorithm to use for URL shortening? I'm coding using PHP. Initially I thought about writing something that would start at a character such as "a" and iterate through requests, creating records in a database and therefore having to increment the character to b, c, d ... A, B and so on as appropriate.

However it dawned on me that this algorithm could be pretty heavy/clumsy and there could be a better way to do it.

I read around a bit on Google and some people seem to be doing it with base conversion from the database's ID column. This isn't something I'm too familiar with.

Could someone elaborate and explain to me how this would work? A couple of code examples would be great, too.

I obviously don't want a complete solution as I would like to learn by doing it myself, but just an explanation/pseudo-code on how this would work would be excellent.

like image 206
George Avatar asked Aug 18 '10 15:08

George


People also ask

Which algorithm is used to shorten URL?

To generate a unique short URL, we can compute it using the Unique Hash(MD5, SHA256, etc.) of the original URL and then encode using base62. If we use the MD5 algorithm as our hash function, it'll produce a 128-bit hash value. After base62 encoding, we'll get a string having more than seven characters.

How do you shorten a URL programmatically?

One Simple Solution could be Hashing. Use a hash function to convert long string to short string. In hashing, that may be collisions (2 long URLs map to same short URL) and we need a unique short URL for every long URL so that we can access long URL back.

How does shortening a URL work?

URL shorteners work by creating a redirect to your long URL. Entering a URL into your internet browser sends an HTTP request to the web server to pull up a specific website. The long and the short URLs are both simply different starting points for an internet browser to get the same destination.


2 Answers

Most shortening services just use a counter that is incremented with every entry and convert the base from 10 to 64.

An implementation in PHP could look like this:

function encode($number) {
    return strtr(rtrim(base64_encode(pack('i', $number)), '='), '+/', '-_');
}
function decode($base64) {
    $number = unpack('i', base64_decode(str_pad(strtr($base64, '-_', '+/'), strlen($base64) % 4, '=')));
    return $number[1];
}

$number = mt_rand(0, PHP_INT_MAX);
var_dump(decode(encode($number)) === $number);

The encode function takes an integer number, converts it into bytes (pack), encodes it with the Base-64 encoding (base64_encode), trims the trailing padding = (rtrim), and replaces the characters + and / by - and _ respectively (strtr). The decode function is the inverse function to encode and does the exact opposite (except adding trailing padding).

The additional use of strtr is to translate the original Base-64 alphabet to the URL and filename safe alphabet as + and / need to be encoded with the Percentage-encoding.

like image 186
Gumbo Avatar answered Sep 21 '22 18:09

Gumbo


You can use base_convert function to do a base convertion from 10 to 36 with the database IDs.

<?php
   $id = 315;
   echo base_convert($id, 10, 36), "\n";
?>

Or you can reuse some of the ideas presented in the comments on the page bellow:

http://php.net/manual/en/function.base-convert.php

like image 35
hgf Avatar answered Sep 23 '22 18:09

hgf