I have a 9 million row table. I figured out that a large amount of it (around 90%) can be freed up. What actions are needed after the cleanup? Vacuum, reindex etc.
Thanks in advance.
If you want to actually reclaim that space on disk, making it available to the OS, you'll need to run VACUUM FULL. Keep in mind that VACUUM can run concurrently, but VACUUM FULL requires an exclusive lock on the table. You will also want to REINDEX, since the indexes will remain bloated even after the VACUUM runs.
PostgreSQL does not impose a limit on the total size of a database. Databases of 4 terabytes (TB) are reported to exist. A database of this size is more than sufficient for all but the most demanding applications.
In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
A vacuum is used for recovering space occupied by “dead tuples” in a table. A dead tuple is created when a record is either deleted or updated (a delete followed by an insert). PostgreSQL doesn't physically remove the old row from the table but puts a “marker” on it so that queries don't return that row.
If you want to free up space on the file system, either VACUUM FULL or CLUSTER can help you. You will also want to run ANALYZE after these, to make sure the planner has up-to-date statistics. Your tables will be entirely locked during this procedure (reads and writes on this table will hang), so you probably want to take your application offline for the duration.
In PostgreSQL 8.2 and earlier, VACUUM FULL is probably your best bet.
In PostgreSQL 8.3 and 8.4, the CLUSTER command was significantly improved, so VACUUM FULL is not recommended -- it's slow and it will bloat your indexes. CLUSTER will re-create indexes from scratch and without bloat. In my experience it's usually much faster too. CLUSTER will additionally sort the whole physical table using an index, so you have to pick an index. If you don't know which, the primary key will work fine.
In PostgreSQL 9.0, VACUUM FULL was changed to work like CLUSTER, so both are good.
It's hard to make predictions, but on a properly tuned server with commodity hardware, 9 million rows shouldn't take longer than 20 minutes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With