Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does vacuum analyze change query plan while analyze does not?

I wanted to utilize the power of index-only scans in Postgres and experimented with one table:

CREATE TABLE dest.contexts
(
  id integer NOT NULL,
  phrase_id integer NOT NULL,
  lang character varying(5) NOT NULL,
  ranking_value double precision,
  index_min integer,
  index_max integer,
  case_sensitive boolean,
  is_enabled boolean,
  is_to_sync boolean NOT NULL DEFAULT true
);

insert into dest.contexts select * from source.contexts;

alter table dest.contexts
  add constraint pk_contexts primary key (id, phrase_id, lang);

 CREATE INDEX idx_contexts_
  ON dest.contexts
  USING btree
  (id, is_enabled, lang, phrase_id, ranking_value, index_min, index_max, case_sensitive);

The index covers all columns I want to use in the next query:

explain analyze
select ranking_value, index_min, index_max, case_sensitive
from dest.contexts
where id = 456 and is_enabled

I check the plan immediately after creation:

Bitmap Heap Scan on contexts  (cost=4.41..31.46 rows=12 width=17) (actual time=0.045..0.045 rows=0 loops=1)
  Recheck Cond: (id = 456)
  Filter: is_enabled
  ->  Bitmap Index Scan on idx_contexts_  (cost=0.00..4.40 rows=12 width=0) (actual time=0.038..0.038 rows=0 loops=1)
        Index Cond: ((id = 456) AND (is_enabled = true))
Planning time: 0.631 ms
Execution time: 0.093 ms

It is strange, but OK...

In several seconds it changes to another (autovacuum?)

Index Scan using pk_contexts on contexts  (cost=0.28..17.93 rows=6 width=17) (actual time=0.027..0.027 rows=0 loops=1)
  Index Cond: (id = 456)
  Filter: is_enabled
Planning time: 0.185 ms
Execution time: 0.070 ms

I try to force it to use index-only scan:

analyze dest.contexts

But it does not change anything. Then I do

vacuum verbose analyze dest.contexts;

INFO:  vacuuming "dest.contexts"
INFO:  index "pk_contexts" now contains 4845 row versions in 21 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_contexts_" now contains 4845 row versions in 37 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "contexts": found 0 removable, 4845 nonremovable row versions in 41 out of 41 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "dest.contexts"
INFO:  "contexts": scanned 41 of 41 pages, containing 4845 live rows and 0 dead rows; 4845 rows in sample, 4845 estimated total rows

And here I finally get what I want:

Index Only Scan using idx_contexts_ on contexts  (cost=0.28..4.40 rows=6 width=17) (actual time=0.014..0.014 rows=0 loops=1)
  Index Cond: ((id = 456) AND (is_enabled = true))
  Filter: is_enabled
  Heap Fetches: 0
Planning time: 0.247 ms
Execution time: 0.052 ms

So here are the questions:

  1. Why does not analyze teach it to use the large 'all-covering' index?
  2. Why does vacuum analyze do it?
  3. My table was filled from the scratch with one large insert. Why does vacuum do anything at all? To my mind there is nothing to vacuum there.
like image 437
greatvovan Avatar asked Sep 01 '17 16:09

greatvovan


People also ask

What is the difference between vacuum and analyze in PostgreSQL?

When a vacuum process runs, the space occupied by these dead tuples is marked reusable by other tuples. An “analyze” operation does what its name says – it analyzes the contents of a database's tables and collects statistics about the distribution of values in each column of every table.

Does vacuum analyze lock table?

Does a vacuum analyze lock tables ? No, it's the "FULL VACUUM" command that locks tables.

Does analyze really run the query?

EXPLAIN ANALYZE will actually run the query, so be careful with updates or deletes! In those cases, consider not using ANALYZE, or you might perhaps wrap the entire statement in a transaction that you can roll back.

Why does Postgres need vacuum?

PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons: To recover or reuse disk space occupied by updated or deleted rows. To update data statistics used by the PostgreSQL query planner. To update the visibility map, which speeds up index-only scans.


1 Answers

Analyze is responsible for collecting statistics, and analyze have effects on selection of query plan and plan costs.

Indexes do not contain visibility information about the tuples. Because of that, reading data pages and filtering on resultset is also applied event though all columns in select is in the index. If all tuples visible in a page, Postgres does not need additional filtering operations. Postgres uses visibility map (vm) to achieve that.

Vacuum uses and also updates visibility map while reclaiming dead tuples. Because of that, vacuum changes query plan for using index-only scan if possible.

https://www.postgresql.org/docs/9.6/static/sql-vacuum.html

https://www.postgresql.org/docs/9.6/static/storage-vm.html

like image 76
samed.yildirim Avatar answered Sep 19 '22 18:09

samed.yildirim