After searching SO and other sites, I've failed to come up with conclusive evidence to how Facebook, Twitter and Pinterest generate their ID's. The reason this is needed is to avoid url collisions. Moving to an entirely different ID will prevent this because there wont be quadrillions of records.
If you look at Pinterest as an example, the first few digits relate to the user id, and the last 6 or so digits represent the save id which possibly could be an auto increment.
To create a similar ID, but not unique I was able to use: base_convert(user_id.save_id, 16, 10)
. The problem here is that it's not unique, ex: base_convert(15.211, 16, 10)
vs. base_convert(152.11, 16, 10)
. These two are the same. Simply just merging two unique sets of numbers will still produce duplicate results. Throwing uniqid()
into the mix will essentially fix the duplicates, but this doesn't seem like a great practice.
Update: Twitter appears to use this: https://github.com/twitter/snowflake
Any suggestions on generating a unique ID like the above examples?
The simplest way to generate identifiers is by a serial number. A steadily increasing number that is assigned to whatever you need to identify next. This is the approached used in most internal databases as well as some commonly encountered public identifiers.
This function in MySQL is used to return a Universal Unique Identifier (UUID) generated according to RFC 4122, “A Universally Unique Identifier (UUID) URN Namespace”. It is designed as a number that is universally unique. Two UUID values are expected to be distinct, even they are generated on two independent servers.
UUID. UUIDs are 128-bit hexadecimal numbers that are globally unique.
Suppose your IDs are all numeric. Delimit them by a character A
(since it surely does not appear in the original IDs) and do a base conversion from base-11 to base-10.
For the example you did we now get different results:
echo base_convert("15A211", 11, 10); //247820
echo base_convert("152A11", 11, 10); //238140
The Flickr comment up above was very useful. We use sharding as well. We have an bigint (int64) locator field. It is generated by combining an int (int32) database id and an int (int32) identity field.
If you know you will have an int16 number of database max (quite likely), you could combine an int16 (smallint) database id and an int32 (int) user id and an int16 (smallint) action id. I don't know reasonable numbers for your application. But reserve some part for the database id, even if it's just tinyint, so you know you're future safe if you add more databases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With