I have a fairly simple table used to drive tile processing for a web mapping application.
Column | Type | Modifiers
---------------+--------------------------+---------------------------------------------------------
id | integer | not null default
nextval('wmts_tiles_id_seq'::regclass)
tile_matrix | integer |
rowx | integer |
coly | integer |
geom | geometry(Geometry,27700) |
processed | smallint |
Indexes:
"wmts_tiles_pkey" PRIMARY KEY, btree (id)
"wmts_tiles_wmts_tile_matrix_x_y" UNIQUE CONSTRAINT, btree (tile_matrix, rowx, coly)
"ix_spatial_wmts_tiles" gist (geom)
"ix_tile_matrix_processed" btree (tile_matrix, processed)
with various indexes (one spatial) and constaints as shown. This table has 240 million rows and the pg_relation_size and pg_total_relation_size indicate that this table is 66 GB, of which, half is the indexes and half the data.
I added a single date column and then ran an update to populate it,
alter database wmts_tiles add column currency_date date;
update wmts_tiles set currency_date = '2014-05-01'
After this, the size went to 133 GB, ie, it doubled. Once I ran, VACUUM FULL on the table, the size shrunk back to 67 GB, ie, 1GB larger than before, which is what you would expect after adding 240 million rows of a 4 byte field (date).
I understand that there will often be a reasonably percentage of dead rows in a table where there are a lot of inserts and deletes happening, but why would a table size double under one single update and is there anything I can do to prevent this? Note, this update was the only transaction running and the table had just been dumped and recreated, so the data pages and index were in a compact state prior to the update.
EDIT: I have seen this question, Why does my postgres table get much bigger under update? and I understand that while the table is being updated that to support MVCC the table needs to be rewritten, what I don't understand is why it stays twice the size, until I explicitly run VACUUM FULL.
Most of this is covered by this prior question.
The reason it doesn't shrink is that PostgreSQL doesn't know you want it to. It's inefficient to allocate disk space (grow a table) and then release it (shrink the table) repeatedly. PostgreSQL prefers to grow a table then keep the disk space, marking it empty and ready for re-use.
Not only is the allocation and release cycle inefficient, but the OS also only permits release of space at the end of a file*. So PostgreSQL has to move all the rows from the end of the file, which was the only place it could write them when you did the update, to the now-free space at the start. It could't do this as it went because it couldn't overwrite any of the old data until the update transaction committed.
If you know you won't need the space again, you can use VACUUM FULL
to compact the table and release the space.
There's no periodic vacuum full done by autovacuum, partly because it might be quite bad for performance if the table just has to be expanded again, partly because vacuum full is quite I/O intensive (so it'll slow everything else down) and partly because vacuum full
requires an access exclusive
lock that prevents any concurrent access to the table. PostgreSQL would need an incremental table repack command/feature, which it doesn't have yet, for this to be possible with autovacuum. Patches are welcome... though this part of the code is very complex and getting it right would not be a beginner job.
*Yes, I know you can mark large regions of zeroes within a file as sparse regions on some platforms. Feel free to submit a PostgreSQL patch for that. In practice you'll have to compact anyway, because you don't find large regions with free pages in the table normally. Plus you'd have to deal with page headers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With