Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does youtube encode their urls?

Quick Question

How does youtube encode theirs urls? take below

http://www.youtube.com/watch?v=MhWyAL2hKlk

what are they doing to get the value MhWyAL2hKlk

are they using some kind of encryption then decrypting at their end

I want to something similar with a website i am working on below looks horrible.

http://localhost:8888/example/account_player/?playlist=drum+and+bass+music

i would like to encode the urls to act like youtubes dont know how they do it tho.

Any advice

like image 248
user1503606 Avatar asked Aug 24 '12 14:08

user1503606


1 Answers

Well, technically speaking, YouTube generates video IDs by using an algorithm. Honestly, I have no idea. It could be a hashsum of the entire video file + a salt using the current UNIX time, or it could be a base64 encoding of something unique to the video. But I do know that it's most likely not random, because if it were, the risk of collision would be too high.

For the sake of example, though, we'll assume that YouTube does generate random ID's. Keep in mind that when using randomly generated values to store something, it is generally a good idea to implement collision checking to ensure that a new object doesn't overwrite the existing one. In practice, though, I would recommend using a hashing algorithm, since they are one-way and very effective at preventing collisions.

So, I'm not very familiar with PHP. I had to write it in JavaScript first. Then, I ported it to PHP, which turned out to be relatively simple:

function randch($charset){
    return $charset[rand() % strlen($charset)];
}

function randstr($len, $charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"){
    $out = [];

    for($i = 0; $i < $len; $i++){
        array_push($out, randch($charset));
    }
    return join("", $out);
}

What this does is generate a random string len characters long via the given charset.

Here's some sample output:

randstr(5)              -> 1EWHd
randstr(30)             -> atcUVgfhAmM5bXz-3jgyRoaVnnY2jD
randstr(30, "asdfASDF") -> aFSdSAfsfSdAsSSddFFSSsdasDDaDa

Though it's not a good idea to use such a short charset.

randstr(30, "asdf")

sdadfaafsdsdfsaffsddaaafdddfad
adaaaaaafdfaadsadsdafdsfdfsadd
dfaffafaaddfdddadasaaafsfssssf

randstr(30)

r5BbvJ45HEN6dWtNZc5ZvHGLCg4Qyq
50vKb1rh66WWf9RLZQY2QrMucoNicl
Mklh3zjuRqDOnVYeEY3B0V3Moia9Dn

Now let's say you have told the page to use this function to generate a random id for a video that was just uploaded, now you want to store this key in a table with a link to the relevant data to display the right page. If an id is requested via $_GET (e.g. /watch?v=02R0-1PWdEf), you can tell the page to check this key against the database containing the video ids, and if it finds a match, grab the data from that key, else give a 404.

You can also encode directly to a base 64 string if you don't want it to be random. This can be done with base64_encode() and base64_decode(). For example, say you have the data for the video in one string $str="filename=apples.avi;owner=coolpixlol124", for whatever reason. base64_encode($str) will give you ZmlsZW5hbWU9YXBwbGVzLmF2aTtvd25lcj1jb29scGl4bG9sMTI0.

To decode it later use base64_decode($new_str), which will give back the original string.

Though, as I said before, it's probably a better idea to use a hashing algorithm like SHA.

I hope this helped.

EDIT: I forgot to mention, YouTube's video ids as of now are 11 characters long, so if you want to use the same kind of thing, you would want to use randstr(11) to generate an 11 digit random string, like this sample id I got: 6AMx8N5r6cg

EDIT 2 (2015.12.17): Completely re-wrote answer. Original was crap, I don't know what I was thinking when I wrote it.

like image 70
Braden Best Avatar answered Sep 24 '22 17:09

Braden Best