Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do sites use random alphanumeric ids rather than database ids to identify content?

Why do sites like YouTube, Imgur and most others use random characters as their content ids rather than just sequential numbers, like those created by auto-increment in MySQL?

To explain what I mean:

In the URL: https://www.youtube.com/watch?v=QMlXuT7gd1I

The QMlXuT7gd1I at the end indicates the specific video on that page, but I'm assuming that video also has a unique numeric id in the database. Why do they create and use this alphanumeric string rather than just use the video's database id?

I'm creating a site which identifies content in the URL like above, but I'm currently using just the DB id. I'm considering switching to random strings because all major sites do it, but I'd like to know why this is done before I implement it.

Thanks!

like image 633
user3471040 Avatar asked Dec 18 '14 06:12

user3471040


1 Answers

Some sites do that because of sharding.

When you have only one process (one server) writing, it is possible to make an auto-increment id without having duplicate ids, but when you have multiple servers (with multiple processes) writing content, like youtube, it's not possible to use autoincrement id anymore. The costs of synchronization to avoid duplication would be huge.

For example, if you read mongodb's ocjectid documentation you can see this structure for the id: a 4-byte value representing the seconds since the Unix epoch, a 3-byte machine identifier, a 2-byte process id, and a 3-byte counter, starting with a random value.

At the end, it's only 12 byte. The thing is when you represent in hexadecimal, it seems like 24 bytes, but that is only when you show it.

Another advantage of this system is that the timestamp is included in the id, so you can decouple the id to get the timestamp.

like image 165
Noogic Avatar answered Sep 19 '22 00:09

Noogic