Why do sites like YouTube, Imgur and most others use random characters as their content ids rather than just sequential numbers, like those created by auto-increment in MySQL?
To explain what I mean:
In the URL: https://www.youtube.com/watch?v=QMlXuT7gd1I
The QMlXuT7gd1I at the end indicates the specific video on that page, but I'm assuming that video also has a unique numeric id in the database. Why do they create and use this alphanumeric string rather than just use the video's database id?
I'm creating a site which identifies content in the URL like above, but I'm currently using just the DB id. I'm considering switching to random strings because all major sites do it, but I'd like to know why this is done before I implement it.
Thanks!
Some sites do that because of sharding.
When you have only one process (one server) writing, it is possible to make an auto-increment id without having duplicate ids, but when you have multiple servers (with multiple processes) writing content, like youtube, it's not possible to use autoincrement id anymore. The costs of synchronization to avoid duplication would be huge.
For example, if you read mongodb's ocjectid documentation you can see this structure for the id: a 4-byte value representing the seconds since the Unix epoch, a 3-byte machine identifier, a 2-byte process id, and a 3-byte counter, starting with a random value.
At the end, it's only 12 byte. The thing is when you represent in hexadecimal, it seems like 24 bytes, but that is only when you show it.
Another advantage of this system is that the timestamp is included in the id, so you can decouple the id to get the timestamp.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With