Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql explicit VACUUM vs. auto-VACUUM: Differences? Recommendations?

Quick question from a PostgreSQL (relative) newb:

We run a batch process that, as its final step, deletes most of the previous batches.

Disk space is a concern, so we need to ensure that PostgreSQL cleans up after itself.

Other than forcing PostgreSQL to garbage-collect faster, is there any difference between explicitly calling VACUUM at the end of the batch vs. letting the auto-VACUUM daemon handle it? Is there any reason to recommend one approach vs. the other?

Thanks!

like image 599
Jim Burnell Avatar asked Mar 07 '23 04:03

Jim Burnell


1 Answers

Way back when there was one vacuum, and it was full and blocking. Then PostgreSQL guys added non-blocking vacuum. But you still had to schedule it yourself.

Then some genius made a daemon that ran vacuum automatically for you when the tables needed it. It uses the exact same vacuum command you or I would use, but has a lot of settings, especially default ones, that make it run slower and less intrusively. Primarily these settings are for worker threads (default 3), delay cost (20ms for autovac, 0ms for regular vac) and autovacuum cost delay limit (-1, i.e. use system setting which is 200).

Therefore, regular vacuum is VERY aggressive with no cost delay, and will run as hard and fast as your IO subsystem will let it. It basically competes with your regular workload for IO bandwidth.

Generally you can do one of two things in your situation:

One: Make autovacuum more aggressive. By lowering the autovacuum_vacuum_cost_delay from 20 to something in the 2 to 5 range it will run much faster but still not get in the way too much.

Two: Run regular vacuums by hand. Since regular vacuums, by default, have no cost_delay, this will be the fastest but also the most distruptive.

Decision is yours based on usage patterns etc.

like image 140
Scott Marlowe Avatar answered May 11 '23 01:05

Scott Marlowe