I have a PostgreSQL v10 database with a size of about 100GB.
What is the most efficient (fastest) way to rebuild statistics, for example after a major version upgrade?
ANALYZE
with no parameters updates statistics for then entire database by default — it's painfully slow! This seems like a single process.
Is there any way to parallelize this to speed it up?
Description. ANALYZE collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
Use the ANALYZE command to collect statistics about a database, a table or a specific table column. The PostgreSQL ANALYZE command collects table statistics which support generation of efficient query execution plans by the query planner.
Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus.
You could use vacuumdb
with the same options that pg_upgrade
suggests:
vacuumdb --all --analyze-in-stages
The documentation describes what it does:
Only calculate statistics for use by the optimizer (no vacuum), like
--analyze-only
. Run several (currently three) stages of analyze with different configuration settings, to produce usable statistics faster.This option is useful to analyze a database that was newly populated from a restored dump or by
pg_upgrade
. This option will try to create some statistics as fast as possible, to make the database usable, and then produce full statistics in the subsequent stages.
To calculate statistics with several parallel processes, you can use the option -j
of vacuumdb
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With