Implications of using ADD COLUMN on large dataset

Question

Docs for Redshift say:

ALTER TABLE locks the table for reads and writes until the operation completes.

My question is:
Say I have a table with 500 million rows and I want to add a column. This sounds like a heavy operation that could lock the table for a long time - yes? Or is it actually a quick operation since Redshift is a columnar db? Or it depends if column is nullable / has default value?

Joe Harris · Accepted Answer

I find that adding (and dropping) columns is a very fast operation even on tables with many billions of rows, regardless of whether there is a default value or it's just NULL.

As you suggest, I believe this is a feature of the it being a columnar database so the rest of the table is undisturbed. It simply creates empty (or nearly empty) column blocks for the new column on each node.

novabracket · Answer

I added an integer column with a default to a table of around 65M rows in Redshift recently and it took about a second to process. This was on a dw2.large (SSD type) single node cluster.

Just remember you can only add a column to the end (right) of the table, you have to use temporary tables etc if you want to insert a column somewhere in the middle.

Implications of using ADD COLUMN on large dataset

Tags:

amazon-redshift

Anentropic

2 Answers

Joe Harris

novabracket

Recent Activity

Donate For Us

Implications of using ADD COLUMN on large dataset

Tags:

amazon-redshift

Anentropic

2 Answers

Joe Harris

novabracket

Related questions

Recent Activity

Donate For Us