Postgres EXPLAIN ANALYZE cost estimate row count massively higher than actual row count. No Vacuuming?

Tags:

I have a Postgres 9.4.18 database running on Heroku in a Django project. I had noticed that queries were becoming increasingly slow so I ran an "EXPLAIN ANALYZE" on one query and noticed that for one node the row estimate was massively higher than the actual row count:

->  Seq Scan on listings_listing u1  (cost=0.00..1536692.01 rows=5030003 width=8) (actual time=0.811..11263.410 rows=173537 loops=1)

I then ran "VACUUM FULL ANALYZE" on the table and then reran the "EXPLAIN ANALYZE" on the query and got:

->  Seq Scan on listings_listing u1  (cost=0.00..23554.61 rows=173537 width=8) (actual time=0.001..33.884 rows=173537 loops=1)

The execution time is now 100x faster.

So the two questions are: A) shouldn't auto vacuuming be preventing this? (how do I check if that is enabled?) B) how did it get this way assuming the vacuuming isn't being performed?

--------------------------------- Update

I found this command from heroku that gives autovacuum stats and here is the output (unfortunately I ran it after my manual vacuum.

heroku pg:vacuum_stats DATABASE_URL

schema |                  table                  | last_vacuum | last_autovacuum  |    rowcount    | dead_rowcount  | autovacuum_threshold | expect_autovacuum 
--------+-----------------------------------------+-------------+------------------+----------------+----------------+----------------------+-------------------
 public | listings_listing                        |             | 2018-06-27 15:36 |        173,537 |              0 |         34,757       |

It would seem the threshold indicated should have caused it to run vacuum a long time ago.

Also, here is the Heroku page for their documentation on vacuuming settings: https://devcenter.heroku.com/articles/managing-vacuum-on-heroku-postgres

624

asked Jul 03 '18 05:07

jeffjv

1 Answers

To find out if autovacuum is enabled as it should be, run

SHOW autovacuum;

To find out if autovacuum is disabled for your specific table, run

SELECT reloptions FROM pg_class WHERE relname = 'listings_listing';

The answer to B) is simple:

If autovacuum isn't running, every UPDATE or DELETE will create a “dead tuple” (or “dead row version”) in the table. These will never be cleaned up unless you run VACUUM manually and will cause the table to grow so that sequential scans become slower.

The answer to A) is more difficult:

There are several things that can keep autovacuum from doing its job:

The rate of changes to this table can be so high that autovacuum, which runs slow be default so that it does not disturb normal activity, cannot keep up.

In this case you should tune autovacuum to be more aggressive for this table:
```
ALTER TABLE listings_listing SET (
   autovacuum_vacuum_cost_limit = 1000,
   toast.autovacuum_vacuum_cost_limit = 1000
);
```
If that is not good enough, you can
```
ALTER TABLE listings_listing SET (
   autovacuum_vacuum_cost_delay = 0,
   toast.autovacuum_vacuum_cost_delay = 0
);
```
There are concurrent long transactions.

Autovacuum can only remove dead tuples that are older than the oldest running transaction, so long transactions can keep it from doing its job.

There is more to the story; read this blog post.

However, this would also keep VACUUM (FULL) from doing its job, so maybe that is not your problem.
The table is frequently locked with SHARE UPDATE EXCLUSIVE or stronger locks, for example by running “LOCK listings_listing”.

When autovacuum encounters such a lock, it backs down rather than blocking user activity.

A useful way to determine what is going on is to query pg_stat_user_tables like this:

SELECT n_live_tup, n_dead_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'listings_listing';

But that evidence is probably destroyed now that you have run VACUUM (FULL).

Another good thing to do is to set log_autovacuum_min_duration to something else than -1 and look into the log occasionally.

answered Sep 30 '22 18:09

Laurenz Albe

Related questions
                            
                                Django Cannot resolve keyword 'Word' into field. Choices are:
                            
                                Python Social Auth for Django raises Authforbidden exception
                            
                                uWSGI and Python > 3.4
                            
                                virtualenv: cannot access lib: No such virtualenv or site directory
                            
                                Usage of unique_for_date
                            
                                Django Rest Framework - OPTIONS request - Get foreign key choices
                            
                                How to make a numpy array field in django?
                            
                                Element disappears when I add an {% include %} tag inside my for loop
                            
                                URL path parameters vs query parameters in Django
                            
                                How to use MultiPartParser in Django Rest Framework
                            
                                Mocking Celery `self.request` attribute for bound tasks when called directly
                            
                                Remove history button from Django admin
                            
                                django no module name 'config'
                            
                                graphql reason-apollo - recursive parsing of options
                            
                                django-admin makemessages --no-obsolete doesn't seem to be working
                            
                                Difference between PositiveInteger and PositiveSmallInteger Field in Django
                            
                                Stripe charge after subscription, get metadata from subscription
                            
                                query foreign key table for list view in django
                            
                                DRF: callable in serializer choicefield "choices"
                            
                                Why should we use django-webpack-loader?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgres EXPLAIN ANALYZE cost estimate row count massively higher than actual row count. No Vacuuming?

Tags:

postgresql

django

heroku

jeffjv

People also ask

1 Answers

Laurenz Albe

Recent Activity

Donate For Us