PostgreSQL: performance impact of extra columns

Question

Given a large table (10-100 million rows) what's the best way to add some extra (unindexed) columns to it?

Just add the columns.
Create a separate table for each extra column, and use joins when you want to access the extra values.

Does the answer change depending on whether the extra columns are dense (mostly not null) or sparse (mostly null)?

Erwin Brandstetter · Accepted Answer

A column with a NULL value can be added to a row without any changes to the rest of the data page in most cases. Only one bit has to be set in the NULL bitmap. So, yes, a sparse column is much cheaper to add in most cases.

Whether it is a good idea to create a separate 1:1 table for additional columns very much depends on the use case. It is generally more expensive. For starters, there is an overhead of 28 bytes (heap tuple header plus item identifier) per row and some additional overhead per table. It is also much more expensive to JOIN rows in a query than to read them in one piece. And you need to add a primary / foreign key column plus an index on it. Splitting may be a good idea if you don't need the additional columns in most queries. Mostly it is a bad idea.

Adding a column is fast in PostgreSQL. Updating the values in the column is what may be expensive, because every UPDATE writes a new row (due to the MVCC model). Therefore, it is a good idea to update multiple columns at once.

Database page layout in the manual.

How to calculate row sizes:

Making sense of Postgres row sizes
Calculating and saving space in PostgreSQL

PostgreSQL: performance impact of extra columns

Tags:

performance

sql

database

postgresql

Daniel Winterstein

1 Answers

Erwin Brandstetter

Recent Activity

Donate For Us

PostgreSQL: performance impact of extra columns

Tags:

performance

sql

database

postgresql

Daniel Winterstein

1 Answers

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us