Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Free space after massive postgres delete

I have a 9 million row table. I figured out that a large amount of it (around 90%) can be freed up. What actions are needed after the cleanup? Vacuum, reindex etc.

Thanks in advance.

like image 782
Milan Dobrota Avatar asked Mar 16 '11 15:03

Milan Dobrota


People also ask

How do I reclaim disk space in PostgreSQL?

If you want to actually reclaim that space on disk, making it available to the OS, you'll need to run VACUUM FULL. Keep in mind that VACUUM can run concurrently, but VACUUM FULL requires an exclusive lock on the table. You will also want to REINDEX, since the indexes will remain bloated even after the VACUUM runs.

How big is too big for a Postgres database?

PostgreSQL does not impose a limit on the total size of a database. Databases of 4 terabytes (TB) are reported to exist. A database of this size is more than sufficient for all but the most demanding applications.

Do I need to vacuum Postgres?

In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.

What is vacuum full in PostgreSQL?

A vacuum is used for recovering space occupied by “dead tuples” in a table. A dead tuple is created when a record is either deleted or updated (a delete followed by an insert). PostgreSQL doesn't physically remove the old row from the table but puts a “marker” on it so that queries don't return that row.


1 Answers

If you want to free up space on the file system, either VACUUM FULL or CLUSTER can help you. You will also want to run ANALYZE after these, to make sure the planner has up-to-date statistics. Your tables will be entirely locked during this procedure (reads and writes on this table will hang), so you probably want to take your application offline for the duration.

In PostgreSQL 8.2 and earlier, VACUUM FULL is probably your best bet.

In PostgreSQL 8.3 and 8.4, the CLUSTER command was significantly improved, so VACUUM FULL is not recommended -- it's slow and it will bloat your indexes. CLUSTER will re-create indexes from scratch and without bloat. In my experience it's usually much faster too. CLUSTER will additionally sort the whole physical table using an index, so you have to pick an index. If you don't know which, the primary key will work fine.

In PostgreSQL 9.0, VACUUM FULL was changed to work like CLUSTER, so both are good.

It's hard to make predictions, but on a properly tuned server with commodity hardware, 9 million rows shouldn't take longer than 20 minutes.

  • See the documentation for CLUSTER.
  • PostgreSQL wiki about VACUUM FULL and recovering dead space
like image 78
intgr Avatar answered Sep 18 '22 07:09

intgr