Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implications of using ADD COLUMN on large dataset

Docs for Redshift say:

ALTER TABLE locks the table for reads and writes until the operation completes.

My question is:
Say I have a table with 500 million rows and I want to add a column. This sounds like a heavy operation that could lock the table for a long time - yes? Or is it actually a quick operation since Redshift is a columnar db? Or it depends if column is nullable / has default value?

like image 818
Anentropic Avatar asked Jun 02 '14 14:06

Anentropic


2 Answers

I find that adding (and dropping) columns is a very fast operation even on tables with many billions of rows, regardless of whether there is a default value or it's just NULL.

As you suggest, I believe this is a feature of the it being a columnar database so the rest of the table is undisturbed. It simply creates empty (or nearly empty) column blocks for the new column on each node.

like image 86
Joe Harris Avatar answered Jan 03 '23 12:01

Joe Harris


I added an integer column with a default to a table of around 65M rows in Redshift recently and it took about a second to process. This was on a dw2.large (SSD type) single node cluster.

Just remember you can only add a column to the end (right) of the table, you have to use temporary tables etc if you want to insert a column somewhere in the middle.

like image 33
novabracket Avatar answered Jan 03 '23 14:01

novabracket