Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I know if the statistics of a Postgres table are up to date?

Tags:

In pgAdmin, whenever a table's statistics are out-of-date, it prompts:

Running VACUUM recommended

The estimated rowcount on the table schema.table deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.

I've tested it using pgAdmin 3 and Postgres 8.4.4, with autovacuum=off. The prompt shows up immediately whenever I click a table that has been changed.

Let's say I'm making a web-based system in Java, how do I detect if a table is out-of-date, so that I can show a prompt like the one in pgAdmin?

Because of the nature of my application, here are a few rules I have to follow:

  1. I want to know if the statistics of a certain table in pg_stats and pg_statistic are up to date.

  2. I can't set the autovacuum flag in postgresql.conf. (In other words, the autovacuum flag can be on or off. I have no control over it. I need to tell if the stats are up-to-date whether the autovacuum flag is on or off.)

  3. I can't run vacuum/analyze every time to make it up-to-date.

  4. When a user selects a table, I need to show the prompt that the table is outdated when there are any updates to this table (such as drop, insert, and update) that are not reflected in pg_stats and pg_statistic.

It seems that it's not feasible by analyzing timestamps in pg_catalog.pg_stat_all_tables. Of course, if a table hasn't been analyzed before, I can check if it has a timestamp in last_analyze to find out whether the table is up-to-date. Using this method, however, I can't detect if the table is up-to-date when there's already a timestamp. In other words, no matter how many rows I add to the table, its last_analyze timestamp in pg_stat_all_tables is always for the first analyze (assuming the autovacuum flag is off). Therefore, I can only show the "Running VACUUM recommended" prompt for the first time.

It's also not feasible by comparing the last_analyze timestamp to the current timestamp. There might not be any updates to the table for days. And there might be tons of updates in one hour.

Given this scenario, how can I always tell if the statistics of a table are up-to-date?

like image 291
Beibei Avatar asked Aug 01 '11 20:08

Beibei


People also ask

How are stats updated in PostgreSQL?

By default, PostgreSQL is configured with an AUTOVACUUM daemon which automates the run of statistics collection by using the ANALYZE commands (in addition to automation of the VACUUM command). The AUTOVACUUM daemon scans for tables that show signs of large modifications in data to collect the current statistics.

What is table statistics in PostgreSQL?

Utilizing stats tables in PostgreSQL, you can monitor the number of live and dead rows, also referred to as tuples, in the table. Live rows are the rows in your table that are currently in use and can be queried in Chartio to reference and analyze data.

How do I check my PostgreSQL database status?

Summary. Use \l or \l+ in psql to show all databases in the current PostgreSQL server. Use the SELECT statement to query data from the pg_database to get all databases.

How do you analyze data in PostgreSQL?

With the help of ANALYZE command, you can get the stats of all the Postgres databases. For this, log in to your Postgres console and then type ANALYZE to execute the command. Upon successful execution of the command, it would return ANALYZE.


2 Answers

Check the system catalogs.

test=# SELECT schemaname, relname, last_analyze FROM pg_stat_all_tables WHERE relname = 'city';  schemaname | relname |         last_analyze           ------------+---------+-------------------------------  pagila     | city    | 2011-07-26 19:30:59.357898-07  world      | city    | 2011-07-26 19:30:53.119366-07 (2 rows) 

All kinds of useful information in there:

test=# \d pg_stat_all_tables           View "pg_catalog.pg_stat_all_tables"       Column       |           Type           | Modifiers  -------------------+--------------------------+-----------  relid             | oid                      |   schemaname        | name                     |   relname           | name                     |   seq_scan          | bigint                   |   seq_tup_read      | bigint                   |   idx_scan          | bigint                   |   idx_tup_fetch     | bigint                   |   n_tup_ins         | bigint                   |   n_tup_upd         | bigint                   |   n_tup_del         | bigint                   |   n_tup_hot_upd     | bigint                   |   n_live_tup        | bigint                   |   n_dead_tup        | bigint                   |   last_vacuum       | timestamp with time zone |   last_autovacuum   | timestamp with time zone |   last_analyze      | timestamp with time zone |   last_autoanalyze  | timestamp with time zone |   vacuum_count      | bigint                   |   autovacuum_count  | bigint                   |   analyze_count     | bigint                   |   autoanalyze_count | bigint                   | 
like image 142
Sean Avatar answered Oct 08 '22 00:10

Sean


You should not have to worry about vac'ing in your application. Instead, you should have the autovac process configured on your server (in postgresql.conf), and the server takes takes of VACCUM and ANALYZE processes based on its own internal statistics. You can configure how often it should run, and what the threshold variables are for it to process.

like image 30
atrain Avatar answered Oct 08 '22 02:10

atrain