We have a huge InnoDB table with hundreds of millions of rows and only 3 columns: GUID, enum, smallint. All lookups are done by GUID.
We're considering making GUID the PK and partitioning it BY KEY.
We've heard that using GUID as PK is bad because of its random distribution and the fact that PK creates clustered index. So storing rows in random order of GUIDs increases fragmentation and page splits.
The alternative to using GUID as PK is to create a surrogate auto-increment key, and use that as PK. However, if we want to partition the table by GUID, that GUID has to be part of PK as well. Also, since all queries are done by GUID, we need an additional GUID index. That index essentially maps GUID->PK, while if we use GUID as PK - the table itself maps GUID->enum+small int?
So my question is whether we gain anything by adding auto-inc PK and having additional GUID index?
Thanks, Philopator.
Bookmark this question. Show activity on this post. I have my MySQL database having tables where on each table, we have a ID (int, auto increment) clustered primary key and a UUID (varchar36) column that's just a GUID. The ID is there just for indexing purposes and nothing more. Joins and all queries will run against the UUID column.
GUIDs as PRIMARY KEYs and/or the clustering key. Instead, SQL Server just uses a forwarding pointer to make one extra hop (never more) to get to the data. In a clustered table, SQL Server uses the clustering key to lookup the data. As a result, this puts some strain on the clustering key that was never there before.
If you ask me, i never use GUID as database identity column of any type, including PK even if you force me to design with a shotgun at the head. Using GUID as primary key is a definitive scaling stopper, and a critical one. I recommend you check database identity and sequence option.
As mentioned earlier, GUID values are unique across tables, databases, and servers. GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.
The problem with using GUIDs as a PK in InnoDB isn't just the fact that GUID distribution is random. It's that records in InnoDB are stored in primary key order. That means in the table design you're talking about, InnoDB is going to be constantly moving data about in an effort to sort your GUIDs. You should use a translation table that maps the GUIDs to int or bigint and use that as the PK.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With