Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dropping column in Postgres on a large dataset

So I have a table with a large dataset and this table has a three columns that I would like to drop.
The question is: how will Postgres deal with it?

Will it walk through every entry or will it just update mapping info without much overhead? Can I just make an ALTER TABLE or should I use swap-table in this particular case?

And, if it makes any difference, all three columns have fixed length (two integers and one numeric).

I'm sorry if it's been asked already, but Google couldn't find any related questions / articles ...

like image 560
nikita2206 Avatar asked Mar 29 '13 08:03

nikita2206


People also ask

How do I drop a column in PostgreSQL?

Syntax. The syntax to drop a column in a table in PostgreSQL (using the ALTER TABLE statement) is: ALTER TABLE table_name DROP COLUMN column_name; table_name.

How big is too big for a Postgres database?

PostgreSQL does not impose a limit on the total size of a database. Databases of 4 terabytes (TB) are reported to exist. A database of this size is more than sufficient for all but the most demanding applications.

Can we drop column using alter?

SQL Server has ALTER TABLE DROP COLUMN command for removing columns from an existing table. We can use the below syntax to do this: ALTER TABLE table_name.


2 Answers

ALTER TABLE DROP COLUMN does just only disabling columns in system tables. It is very fast, but it doesn't remove data from heap files. You have to do VACUUM FULL later to compact allocated file space. So ALTER TABLE DROP COLUMN is very fast. And to compact files, you have to call the slower (with exclusive LOCK) VACUUM FULL.

like image 150
Pavel Stehule Avatar answered Oct 28 '22 01:10

Pavel Stehule


Google may be useless for this question, but the manual rarely fails:

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

And:

To force an immediate rewrite of the table, you can use VACUUM FULL, CLUSTER or one of the forms of ALTER TABLE that forces a rewrite. This results in no semantically-visible change in the table, but gets rid of no-longer-useful data.

Specifically, the column attisdropped in the system catalog table pg_attribute is set to true.

Side effects

There are minor side-effects (as Chris pointed out):

  • Updated or newly inserted rows still store an invisible NULL value which forces a NULL bitmap for every new row, even with no NULL in visible columns. Does not affect existing rows as those keep the original (now invisible) column value.

  • The NULL bitmap must be big enough to cover all visible and dropped columns. In corner cases this may enlarge the NULL bitmap. About effective size:

    • Do nullable columns occupy additional space in PostgreSQL?
  • Dropped columns count against the allowed maximum (which you shouldn't be scraping anyway).

  • There is currently (Postgres 13) no easy way to get rid of the zombi column completely. The above mentioned table rewrites replace the invisible value with NULL (which reclaims almost all space), but neither purges the dropped column from the system catalogs. Not even TRUNCATE. Only creating a new table (or a dump/restore cycle) does that.

like image 38
Erwin Brandstetter Avatar answered Oct 28 '22 02:10

Erwin Brandstetter