Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does youtube calculate the unique 11 digit code for each video [closed]

Youtube seems to have a unique 11 digit code for each video. the code includes 1-9,A-Z,a-z, and some symbols like +_* etc.

How would they calculate this unique code for each video? I am working on something where I'd like to assign a unique code to each record so hence the question.

My questions/concerns are:

  1. If they make it on-the-fly (when videos are submitted) then they'd have to check whether the code prepared for the video already exists or not? That would be an expensive operation across huge dataset like theirs.
  2. Would they run a batch job sort of thing every night or every month that creates unique codes and stash them in the DB. Then as the video is submitted it just takes a code and marks it off as "used"
  3. Would it make sense to take the auto-generated and auto-incremented ID column for each record in the DB and then somehow convert that unique ID column to an 11 digit code?

My goal is to:

  • create a unique code for a record in the table.
  • The user can share the url with that unique code with anyone.
  • When someone comes in via the unique code. Then their "coming in" gets tied to the original user who shared the url with unique code.
like image 397
Anthony Avatar asked Dec 09 '13 21:12

Anthony


Video Answer


2 Answers

Read up on GUID and UID in general.

Most times if you are using a database that will generate a unique id for you and then that unique id can be encoded to numbers and letters to shorten the resulting string.

http://en.wikipedia.org/wiki/Globally_unique_identifier

Shortening the string is about the way you encode the value, it doesn't actually change it.

For example the number 15 in base 10 uses two digits, in hex it uses one digit (f) in binary it uses 4 (1111).

In the same way you can use a-z, A-Z, 0-9 and get base 62 to encode numbers into strings using far fewer digits than using base 10.

It's not the only approach but (especially if you already have database rows for it) it's the simplest. You don't even need to pad to 11 unless you really want to - but adding any number of 0's at the start of the encoded string does not alter its value.

Java even provides functions to do this for you although the max radix on these ones is 36:

http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#toString%28int,%20int%29

like image 189
Tim B Avatar answered Oct 17 '22 15:10

Tim B


The issue with a hashing function over the full set of possible URLs, and then check it against an indexed database, is that it removes synchronization possibilities. Consider the amount of time it takes to upload a video, checking it against their database requires just about no time, that's not the issue. The same issue happens when you think about pre-calculating: that requires synchronization over a single point of access if you want to use distributed computers, which I'm sure they do. I think your third point is probably closest to correct, and then that ID is somehow encoded into a longer number for some reason (I'm actually not sure what the advantage of it is vs. an int value; anyone got a good reason?)

like image 1
AaronB Avatar answered Oct 17 '22 15:10

AaronB