MySQL PRIMARY KEYs: UUID / GUID vs BIGINT (timestamp+random)

Tags:

tl;dr: Is assigning rows IDs of {unixtimestamp}{randomdigits} (such as 1308022796123456) as a BIGINT a good idea if I don't want to deal with UUIDs?

Just wondering if anyone has some insight into any performance or other technical considerations / limitations in regards to IDs / PRIMARY KEYs assigned to database records across multiple servers.

My PHP+MySQL application runs on multiple servers, and the data needs to be able to be merged. So I've outgrown the standard sequential / auto_increment integer method of identifying rows.

My research into a solution brought me to the concept of using UUIDs / GUIDs. However the need to alter my code to deal with converting UUID strings to binary values in MySQL seems like a bit of a pain/work. I don't want to store the UUIDs as VARCHAR for storage and performance reasons.

Another possible annoyance of UUIDs stored in a binary column is the fact that rows IDs aren't obvious when looking at the data in PhpMyAdmin - I could be wrong about this though - but straight numbers seem a lot simpler overall anyway and are universal across any kind of database system with no conversion required.

As a middle ground I came up with the idea of making my ID columns a BIGINT, and assigning IDs using the current unix timestamp followed by 6 random digits. So lets say my random number came about to be 123456, my generated ID today would come out as: 1308022796123456

A one in 10 million chance of a conflict for rows created within the same second is fine with me. I'm not doing any sort of mass row creation quickly.

One issue I've read about with randomly generated UUIDs is that they're bad for indexes, as the values are not sequential (they're spread out all over the place). The UUID() function in MySQL addresses this by generating the first part of the UUID from the current timestamp. Therefore I've copied that idea of having the unix timestamp at the start of my BIGINT. Will my indexes be slow?

Pros of my BIGINT idea:

Gives me the multi-server/merging advantages of UUIDs
Requires very little change to my application code (everything is already programmed to handle integers for IDs)
Half the storage of a UUID (8 bytes vs 16 bytes)

Cons:

??? - Please let me know if you can think of any.

Some follow up questions to go along with this:

Should I use more or less than 6 random digits at the end? Will it make a difference to index performance?
Is one of these methods any "randomer" ?: Getting PHP to generate 6 digits and concatenating them together -VS- getting PHP to generate a number in the 1 - 999999 range and then zerofilling to ensure 6 digits.

Thanks for any tips. Sorry about the wall of text.

408

asked Jun 14 '11 04:06

LaVache

3 Answers

I have run into this very problem in my professional life. We used timestamp + random number and ran into serious issues when our applications scaled up (more clients, more servers, more requests). Granted, we (stupidly) used only 4 digits, and then change to 6, but you would be surprised how often that the errors still happen.

Over a long enough period of time, you are guaranteed to get duplicate key errors. Our application is mission critical, and therefore even the smallest chance it could fail to due inherently random behavior was unacceptable. We started using UUIDs to avoid this issue, and carefully managed their creation.

Using UUIDs, your index size will increase, and a larger index will result in poorer performance (perhaps unnoticeable, but poorer none-the-less). However MySQL supports a native UUID type (never use varchar as a primary key!!), and can handle indexing, searching,etc pretty damn efficiently even compared to bigint. The biggest performance hit to your index is almost always the number of rows indexed, rather than the size of the item being index (unless you want to index on a longtext or something ridiculous like that).

To answer you question: Bigint (with random numbers attached) will be ok if you do not plan on scaling your application/service significantly. If your code can handle the change without much alteration and your application will not explode if a duplicate key error occurs, go with it. Otherwise, bite-the-bullet and go for the more substantial option.

You can always implement a larger change later, like switching to an entirely different backend (which we are now facing... :P)

answered Sep 29 '22 20:09

Colin

You can manually change the autonumber starting number.

ALTER TABLE foo AUTO_INCREMENT = ####

An unsigned int can store up to 4,294,967,295, lets round it down to 4,290,000,000.

Use the first 3 digits for the server serial number, and the final 7 digits for the row id.

This gives you up to 430 servers (including 000), and up to 10 million IDs for each server.

So for server #172 you manually change the autonumber to start at 1,720,000,000, then let it assign IDs sequentially.

If you think you might have more servers, but less IDs per server, then adjust it to 4 digits per server and 6 for the ID (i.e. up to 1 million IDs).

You can also split the number using binary digits instead of decimal digits (perhaps 10 binary digits per server, and 22 for the ID. So, for example, server 76 starts at 2^22*76 = 318,767,104 and ends at 322,961,407).

For that matter you don't even need a clear split. Take 4,294,967,295 divide it by the maximum number of servers you think you will ever have, and that's your spacing.

You could use a bigint if you think you need more identifiers, but that's a seriously huge number.

answered Sep 29 '22 20:09

Ariel

Use the GUID as a unique index, but also calculate a 64-bit (BIGINT) hash of the GUID, store that in a separate NOT UNIQUE column, and index it. To retrieve, query for a match to both columns - the 64-bit index should make this efficient.

What's good about this is that the hash:
a. Doesn't have to be unique.
b. Is likely to be well-distributed.

The cost: extra 8-byte column and its index.

answered Sep 29 '22 20:09

ToolmakerSteve

Related questions
                            
                                Node.js MySQL module - throw err; // Rethrow non-MySQL errors;
                            
                                Indexing VARCHAR in MySQL
                            
                                How to explicitly declare charset=utf8 for Airflow connections
                            
                                why i can insert null into mysql field which is varchar not null
                            
                                Difference between passing the data type and executing an array without them
                            
                                MYSQL - Check conflicting time
                            
                                Amazon Aurora DB Cluster Not Auto Balancing Correctly
                            
                                How can I combine TABLE_PER_CLASS and GenerationType.IDENTITY
                            
                                MySQL Error Code: #1111 - Invalid use of group function [duplicate]
                            
                                Google Dataflow (Apache beam) JdbcIO bulk insert into mysql database
                            
                                Risks of running a cronjob every minute
                            
                                Why can't I upgrade MySQL in Xampp?
                            
                                How to use php to convert odd characters to single characters from iPhone input
                            
                                How to append/delete data from JSONField in django
                            
                                Designing a SaaS on a single database, multi-tenant (SQL)
                            
                                using name column in belongsToMany relation instead of id
                            
                                Lazy fetching single column (class attribute) with Hibernate
                            
                                Nested Cursors in Mysql
                            
                                When should I commit with SQLAlchemy using a for loop?
                            
                                Using MySQL with Django - Access denied for user '@'localhost

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MySQL PRIMARY KEYs: UUID / GUID vs BIGINT (timestamp+random)

Tags:

performance

sql

database

mysql

database-design

LaVache

People also ask

3 Answers

Colin

Ariel

ToolmakerSteve

Recent Activity

Donate For Us