Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UUID Primary Key in Postgres, What Insert Performance Impact?

I am wondering about the performance impact of using a non-sequential UUID as the primary key in a table that will become quite large in PosgreSQL.

In DBMS's that use clustered storage for table records it is a given that using a UUID is going to increase the cost of inserts due to having to read from disk to find the data page into which to perform the insert, once the table is too big to hold in memory. As I understand it, Postgres does not maintain row clustering on inserts, so I imagine that in Postgres using a UUID PK does not hurt the performance of that insert.

But I would think that it makes the insert into the index that the primary key constraint creates much more expensive once the table is large, because it will have to constantly be read from disk to update the index on insertion of new data. Whereas with a sequential key the index will only be updated at the tip which will always be in memory.

Assuming that I understand the performance impact on the index correctly, is there any way to remedy that or are UUIDs simply not a good PK on a large, un-partitioned table?

like image 827
philofinfinitejest Avatar asked Dec 11 '15 18:12

philofinfinitejest


People also ask

Is UUID good for primary key Postgres?

You can use UUID as primary key in your table as it will be unique. However do keep in mind that UUID will occupy a bit more space as compared to SEQUENCE. And also they are not very fast. But yes they are for sure unique and hence you are guaranteed to get a consistent data.

Why UUID is bad for primary key?

Primary keys should never be exposed, even UUIDsA primary key is, by definition unique within its scope. It is, therefore, an obvious thing to use as a customer number, or in a URL to identify a unique page or row. Don't! I would argue that using a PK in any public context is a bad idea.

Is it good to use UUID as primary key?

Pros. Using UUID for a primary key brings the following advantages: UUID values are unique across tables, databases, and even servers that allow you to merge rows from different databases or distribute databases across servers. UUID values do not expose the information about your data so they are safer to use in a URL.

Are UUIDs slow?

This is your primary key, you don't want it to be slow. At its bit level, a UUID is 128 bits, which means it will fit into 16 bytes, note this is not very human readable, but it will keep storage low, and is only 4 times larger than a 32-bit int, or 2 times larger than a 64-bit int.


2 Answers

As I understand it, Postgres does not maintain row clustering on inserts

Correct at the moment. Unfortunately.

so I imagine that in Postgres using a UUID PK does not hurt the performance of that insert.

It still does have a performance cost because of the need to maintain the PK, and because the inserted tuple is bigger.

  • The uuid is 4 times as wide as a typical 32-bit integer synthetic key, so the row to write is 12 bytes bigger and you can fit fewer rows into a given amount of RAM

  • The b-tree index that implements the primary key will be 4x as large (vs a 32-bit key), taking longer to search and requiring more memory to cache. It also needs more frequent page splits.

  • Writes will tend to be random within indexes, not appends to hot, recently accessed rows

is there any way to remedy [the performance impact on the index] or are UUIDs simply not a good PK on a large, un-partitioned table?

If you need a UUID key, you need a UUID key. You shouldn't use one if you don't require one, but if you cannot rely on a central source of synthetic keys and there is no suitable natural key to use, it's still the way to go.

Partitioning won't help much unless you can confine writes to one partition. Also, you won't be able to usefully use constraint exclusion on searches for the key if writing only to one partition at a time, so you'll still have to search all the partitions' indexes for a key when doing queries. I can only see it being useful if your UUID forms part of a composite key and you can partition on the other part of the composite key.

like image 105
Craig Ringer Avatar answered Oct 22 '22 02:10

Craig Ringer


It should be mentioned that you will get more WALs generated if you have btree index on UUID column with full_page_writes option enabled. This happens because of UUID randomness - the values are not sequential so each insert is likely to touch completely new leaf index leaf page. You can read more in On the impact of full-page writes article.

like image 38
pensnarik Avatar answered Oct 22 '22 02:10

pensnarik