Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL: performance impact of extra columns

Given a large table (10-100 million rows) what's the best way to add some extra (unindexed) columns to it?

  1. Just add the columns.
  2. Create a separate table for each extra column, and use joins when you want to access the extra values.

Does the answer change depending on whether the extra columns are dense (mostly not null) or sparse (mostly null)?

like image 275
Daniel Winterstein Avatar asked Apr 04 '12 23:04

Daniel Winterstein


1 Answers

A column with a NULL value can be added to a row without any changes to the rest of the data page in most cases. Only one bit has to be set in the NULL bitmap. So, yes, a sparse column is much cheaper to add in most cases.

Whether it is a good idea to create a separate 1:1 table for additional columns very much depends on the use case. It is generally more expensive. For starters, there is an overhead of 28 bytes (heap tuple header plus item identifier) per row and some additional overhead per table. It is also much more expensive to JOIN rows in a query than to read them in one piece. And you need to add a primary / foreign key column plus an index on it. Splitting may be a good idea if you don't need the additional columns in most queries. Mostly it is a bad idea.

Adding a column is fast in PostgreSQL. Updating the values in the column is what may be expensive, because every UPDATE writes a new row (due to the MVCC model). Therefore, it is a good idea to update multiple columns at once.

Database page layout in the manual.

How to calculate row sizes:

  • Making sense of Postgres row sizes
  • Calculating and saving space in PostgreSQL
like image 156
Erwin Brandstetter Avatar answered Oct 13 '22 02:10

Erwin Brandstetter