Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UUID performance in MySQL?

We're considering using UUID values as primary keys for our MySQL database. The data being inserted is generated from dozens, hundreds, or even thousands of remote computers and being inserted at a rate of 100-40,000 inserts per second, and we'll never do any updates.

The database itself will typically get to around 50M records before we start to cull data, so not a massive database, but not tiny either. We're also planing to run on InnoDB, though we are open to changing that if there is a better engine for what we're doing.

We were ready to go with Java's Type 4 UUID, but in testing have been seeing some strange behavior. For one, we're storing as varchar(36) and I now realize we'd be better off using binary(16) - though how much better off I'm not sure.

The bigger question is: how badly does this random data screw up the index when we have 50M records? Would we be better off if we used, for example, a type-1 UUID where the leftmost bits were timestamped? Or maybe we should ditch UUIDs entirely and consider auto_increment primary keys?

I'm looking for general thoughts/tips on the performance of different types of UUIDs when they are stored as an index/primary key in MySQL. Thanks!

like image 371
Patrick Lightbody Avatar asked Mar 02 '10 17:03

Patrick Lightbody


People also ask

Should I use UUID in MySQL?

This function in MySQL is used to return a Universal Unique Identifier (UUID) generated according to RFC 4122, “A Universally Unique Identifier (UUID) URN Namespace”. It is designed as a number that is universally unique. Two UUID values are expected to be distinct, even they are generated on two independent servers.

Is UUID bad for primary key?

UUIDs as primary key aren't a slam drunk, but do have some advantages: The fact that they're random means that they don't rely on a single sequence to be generated. Multiple entities can generate IDs independently, but still store them to a shared data store without clobbering each other.

Is UUID efficient?

The default representation for UUID, hash, and token values is often the hexadecimal notation. With a cardinality, the number of possible values, of only 16 per byte, it is far from efficient.

Should I use UUID in database?

If your database is or will eventually be distributed, like in the case of a local-first application, or simply if your NoSQL database is scaling up and divided upon multiple servers, I'd say that you have almost non choice : Use UUID! Just know that there is some things that you can do to improve performance.


2 Answers

At my job, we use UUID as PKs. What I can tell you from experience is DO NOT USE THEM as PKs (SQL Server by the way).

It's one of those things that when you have less than 1000 records it;s ok, but when you have millions, it's the worst thing you can do. Why? Because UUID are not sequential, so everytime a new record is inserted MSSQL needs to go look at the correct page to insert the record in, and then insert the record. The really ugly consequence with this is that the pages end up all in different sizes and they end up fragmented, so now we have to do de-fragmentation periodic.

When you use an autoincrement, MSSQL will always go to the last page, and you end up with equally sized pages (in theory) so the performance to select those records is much better (also because the INSERTs will not block the table/page for so long).

However, the big advantage of using UUID as PKs is that if we have clusters of DBs, there will not be conflicts when merging.

I would recommend the following model: 1. PK INT Identity 2. Additional column automatically generated as UUID.

This way, the merge process is possible (UUID would be your REAL key, while the PK would just be something temporary that gives you good performance).

NOTE: That the best solution is to use NEWSEQUENTIALID (like I was saying in the comments), but for legacy app with not much time to refactor (and even worse, not controlling all inserts), it is not possible to do. But indeed as of 2017, I'd say the best solution here is NEWSEQUENTIALID or doing Guid.Comb with NHibernate.

Hope this helps

like image 135
Kat Lim Ruiz Avatar answered Oct 03 '22 13:10

Kat Lim Ruiz


A UUID is a Universally Unique ID. It's the universally part that you should be considering here.

Do you really need the IDs to be universally unique? If so, then UUIDs may be your only choice.

I would strongly suggest that if you do use UUIDs, you store them as a number and not as a string. If you have 50M+ records, then the saving in storage space will improve your performance (although I couldn't say by how much).

If your IDs do not need to be universally unique, then I don't think that you can do much better then just using auto_increment, which guarantees that IDs will be unique within a table (since the value will increment each time)

like image 34
Dancrumb Avatar answered Oct 03 '22 13:10

Dancrumb