I have a Postgres 9.4.18 database running on Heroku in a Django project. I had noticed that queries were becoming increasingly slow so I ran an "EXPLAIN ANALYZE" on one query and noticed that for one node the row estimate was massively higher than the actual row count:
-> Seq Scan on listings_listing u1 (cost=0.00..1536692.01 rows=5030003 width=8) (actual time=0.811..11263.410 rows=173537 loops=1)
I then ran "VACUUM FULL ANALYZE" on the table and then reran the "EXPLAIN ANALYZE" on the query and got:
-> Seq Scan on listings_listing u1 (cost=0.00..23554.61 rows=173537 width=8) (actual time=0.001..33.884 rows=173537 loops=1)
The execution time is now 100x faster.
So the two questions are: A) shouldn't auto vacuuming be preventing this? (how do I check if that is enabled?) B) how did it get this way assuming the vacuuming isn't being performed?
--------------------------------- Update
I found this command from heroku that gives autovacuum stats and here is the output (unfortunately I ran it after my manual vacuum.
heroku pg:vacuum_stats DATABASE_URL
schema | table | last_vacuum | last_autovacuum | rowcount | dead_rowcount | autovacuum_threshold | expect_autovacuum
--------+-----------------------------------------+-------------+------------------+----------------+----------------+----------------------+-------------------
public | listings_listing | | 2018-06-27 15:36 | 173,537 | 0 | 34,757 |
It would seem the threshold indicated should have caused it to run vacuum a long time ago.
Also, here is the Heroku page for their documentation on vacuuming settings: https://devcenter.heroku.com/articles/managing-vacuum-on-heroku-postgres
The ANALYZE option causes the statement to be actually executed, not only planned. Then actual run time statistics are added to the display, including the total elapsed time expended within each plan node (in milliseconds) and the total number of rows it actually returned.
The ANALYZE option causes the statement to be actually executed, not only planned. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for seeing whether the planner's estimates are close to reality.
Extends EXPLAIN by adding several extra pieces of information that don't necessarily have much in common. This includes schema-qualifying object names, labelling variables in expressions, naming triggers, providing thread information, and adding the query id.
To find out if autovacuum is enabled as it should be, run
SHOW autovacuum;
To find out if autovacuum is disabled for your specific table, run
SELECT reloptions FROM pg_class WHERE relname = 'listings_listing';
The answer to B) is simple:
If autovacuum isn't running, every UPDATE
or DELETE
will create a “dead tuple” (or “dead row version”) in the table. These will never be cleaned up unless you run VACUUM
manually and will cause the table to grow so that sequential scans become slower.
The answer to A) is more difficult:
There are several things that can keep autovacuum from doing its job:
The rate of changes to this table can be so high that autovacuum, which runs slow be default so that it does not disturb normal activity, cannot keep up.
In this case you should tune autovacuum to be more aggressive for this table:
ALTER TABLE listings_listing SET (
autovacuum_vacuum_cost_limit = 1000,
toast.autovacuum_vacuum_cost_limit = 1000
);
If that is not good enough, you can
ALTER TABLE listings_listing SET (
autovacuum_vacuum_cost_delay = 0,
toast.autovacuum_vacuum_cost_delay = 0
);
There are concurrent long transactions.
Autovacuum can only remove dead tuples that are older than the oldest running transaction, so long transactions can keep it from doing its job.
There is more to the story; read this blog post.
However, this would also keep VACUUM (FULL)
from doing its job, so maybe that is not your problem.
The table is frequently locked with SHARE UPDATE EXCLUSIVE
or stronger locks, for example by running “LOCK listings_listing
”.
When autovacuum encounters such a lock, it backs down rather than blocking user activity.
A useful way to determine what is going on is to query pg_stat_user_tables
like this:
SELECT n_live_tup, n_dead_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'listings_listing';
But that evidence is probably destroyed now that you have run VACUUM (FULL)
.
Another good thing to do is to set log_autovacuum_min_duration
to something else than -1 and look into the log occasionally.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With