Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating your own TinyURL

Tags:

php

mysql

I have just found this great tutorial as it is something that I need.

However, after having a look, it seems that this might be inefficient. The way it works is, first generate a unique key then check if it exists in the database to make sure it really is unique. However, the larger the database gets the slower the function gets, right?

Instead, I was thinking, is there a way to add ordering to this function? So all that has to be done is check the previous entry in the DB and increment the key. So it will always be unique?

function generate_chars()

{

    $num_chars = 4; //max length of random chars
    $i = 0;
    $my_keys = "123456789abcdefghijklmnopqrstuvwxyz"; //keys to be chosen from
    $keys_length = strlen($my_keys);
    $url  = "";
    while($i<$num_chars)
    {
        $rand_num = mt_rand(1, $keys_length-1);
        $url .= $my_keys[$rand_num];
        $i++;
    }
    return $url;
}

function isUnique($chars)

{
    //check the uniqueness of the chars
    global $link;
    $q = "SELECT * FROM `urls` WHERE `unique_chars`='".$chars."'";
    $r = mysql_query($q, $link);
    //echo mysql_num_rows($r); die();
    if( mysql_num_rows($r)>0 ): 
        return false;
    else: 
        return true;
    endif;
}
like image 632
Abs Avatar asked Nov 27 '22 00:11

Abs


2 Answers

The tiny url people like to use random tokens because then you can't just troll the tiny url links. "Where does #2 go?" "Oh, cool!" "Where does #3 go?" "Even cooler!" You can type in random characters but it's unlikely you'll hit a valid value.

Since the key is rather sparse (4 values each having 36* possibilities gives you 1,679,616 unique values, 5 gives you 60,466,176) the chance of collisions is small (indeed, it's a desired part of the design) and a good SQL index will make the lookup be trivial (indeed, it's the primary lookup for the url so they optimize around it).

If you really want to avoid the lookup and just unse auto-increment you can create a function that turns an integer into a string of seemingly-random characters with the ability to convert back. So "1" becomes "54jcdn" and "2" becomes "pqmw21". Similar to Base64-encoding, but not using consecutive characters.

(*) I actually like using less than 36 characters -- single-cased, no vowels, and no similar characters (1, l, I). This prevents accidental swear words and also makes it easier for someone to speak the value to someone else. I even map similar charactes to each other, accepting "0" for "O". If you're entirely machine-based you could use upper and lower case and all digits for even greater possibilities.

like image 180
Talljoe Avatar answered Dec 05 '22 22:12

Talljoe


In the database table, there is an index on the unique_chars field, so I don't see why that would be slow or inefficient.

UNIQUE KEY `unique_chars` (`unique_chars`)

Don't rush to do premature optimization on something that you think might be slow.

Also, there may be some benefit in a url shortening service that generates random urls instead of sequential urls.

like image 23
CoderDennis Avatar answered Dec 05 '22 22:12

CoderDennis