Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to rebuild PostgreSQL statistics from zero/scratch with ANALYZE?

I have a PostgreSQL v10 database with a size of about 100GB.

What is the most efficient (fastest) way to rebuild statistics, for example after a major version upgrade?

ANALYZE with no parameters updates statistics for then entire database by default — it's painfully slow! This seems like a single process.

Is there any way to parallelize this to speed it up?

like image 689
maxTrialfire Avatar asked Oct 23 '18 22:10

maxTrialfire


People also ask

What does analyze do in Postgres?

Description. ANALYZE collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.

How do you gather stats in PostgreSQL?

Use the ANALYZE command to collect statistics about a database, a table or a specific table column. The PostgreSQL ANALYZE command collects table statistics which support generation of efficient query execution plans by the query planner.

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus.


1 Answers

You could use vacuumdb with the same options that pg_upgrade suggests:

vacuumdb --all --analyze-in-stages

The documentation describes what it does:

Only calculate statistics for use by the optimizer (no vacuum), like --analyze-only. Run several (currently three) stages of analyze with different configuration settings, to produce usable statistics faster.

This option is useful to analyze a database that was newly populated from a restored dump or by pg_upgrade. This option will try to create some statistics as fast as possible, to make the database usable, and then produce full statistics in the subsequent stages.

To calculate statistics with several parallel processes, you can use the option -j of vacuumdb.

like image 138
Laurenz Albe Avatar answered Oct 13 '22 22:10

Laurenz Albe