Why does my postgres table get much bigger under update?

Tags:

postgresql

I have a table, clustered on two columns (point of sale and product ID). The only index is in those two columns, and the table is clustered on those columns.

On a weekly basis, I update other columns in the table. When I do that, the size of the table and relations increases by about 5 times. I then cluster the table, and the size reverts to what it was pre-update.

This seems strange to me. If I were updating the indexed columns, I'd expect some bloat that I'd need to deal with by vacuuming, but since the indexed columns are not modified by any of the updates, I don't understand why updating the table would lead to an increase in size.

Is this working as expected, or does this point to a problem in my configuration? Is there a way to stop this?

[Postgres 9.1 on Windows 7]

423

asked Jun 02 '14 05:06

JamesF

1 Answers

Even without indexed columns, PostgreSQL still has to do an MVCC update where it writes a new row then later vacuums and discards the old one. Otherwise it couldn't roll back a transaction if there was an error midway through or it crashed. (PostgreSQL doesn't have an undo log, it uses the heap instead).

HOT updates can only be done if there's enough free space in a page to avoid having to write the new row to a different page, where new index entries must then be created. So PostgreSQL still has to write new rows to new pages on the end of the table, even though you aren't updating indexed columns, because there's just nowhere to put the new row versions on the current pages.

I'd usually only expect a doubling of space, but if you're doing a series of updates without vacuum catching up in between then more increases would be expected. Try to do all your updates in one pass or VACUUM between passes.

To make the updates faster at the cost of some disk space, ALTER TABLE to set a non-100 FILLFACTOR on your table before you CLUSTER it. I suggest 45, enough room for one new version of each row plus a little wiggle space. That'll make the table twice the size but reduce the churn of all that rewriting. It'll let HOT updates occur and also speed up updates because there's no need to extend the relation all the time.

Best of all - try to find a way to avoid having to bulk update the whole table periodically.

answered Oct 12 '22 11:10

Craig Ringer

Related questions
                            
                                How to create an empty anonymous table in Postgres?
                            
                                how to monitor a Heroku postgres database
                            
                                pg_restore error from pgadmin3 - Postgresql
                            
                                Postgres transaction seems to take AccessExclusiveLock for no reason
                            
                                Wrong order on has_many association when upgrading from rails 3 to rails 4
                            
                                How to parse postgresql wal log file to sql
                            
                                postgres inverse of json_each
                            
                                Postgres User Authentication on web app through LoginRoles Vs Table
                            
                                postgres migration error; unterminated dollar-quoted string
                            
                                Why does this postgres stored procedure want to `use utf8`?
                            
                                Find what row holds a value which cannot be cast to integer
                            
                                Merging two distinct Postgresql databases into a single database
                            
                                How to rebuild a corrupt postgres primary key index
                            
                                How to update all the values of an column with regular expression in Rails 4
                            
                                Eclipse Link to Hibernate migration error with mappedBy
                            
                                psql can connect to a unix domain socket, but py-postgresql with the same parameters gets 'Permission denied'
                            
                                Postgres function returning one record while I have many records?
                            
                                Console access to Dokku's PostgreSQL plugin?
                            
                                PostgisDialect vs PostgreSQLDialect or both?
                            
                                Postgresql, select empty fields

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With