Youtube seems to have a unique 11 digit code for each video. the code includes 1-9,A-Z,a-z
, and some symbols like +_*
etc.
How would they calculate this unique code for each video? I am working on something where I'd like to assign a unique code to each record so hence the question.
My questions/concerns are:
ID
column for each record in the DB and then somehow convert that unique ID
column to an 11 digit code?My goal is to:
Read up on GUID and UID in general.
Most times if you are using a database that will generate a unique id for you and then that unique id can be encoded to numbers and letters to shorten the resulting string.
http://en.wikipedia.org/wiki/Globally_unique_identifier
Shortening the string is about the way you encode the value, it doesn't actually change it.
For example the number 15 in base 10 uses two digits, in hex it uses one digit (f) in binary it uses 4 (1111).
In the same way you can use a-z, A-Z, 0-9 and get base 62 to encode numbers into strings using far fewer digits than using base 10.
It's not the only approach but (especially if you already have database rows for it) it's the simplest. You don't even need to pad to 11 unless you really want to - but adding any number of 0's at the start of the encoded string does not alter its value.
Java even provides functions to do this for you although the max radix on these ones is 36:
http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#toString%28int,%20int%29
The issue with a hashing function over the full set of possible URLs, and then check it against an indexed database, is that it removes synchronization possibilities. Consider the amount of time it takes to upload a video, checking it against their database requires just about no time, that's not the issue. The same issue happens when you think about pre-calculating: that requires synchronization over a single point of access if you want to use distributed computers, which I'm sure they do. I think your third point is probably closest to correct, and then that ID is somehow encoded into a longer number for some reason (I'm actually not sure what the advantage of it is vs. an int value; anyone got a good reason?)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With