Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgres 9.6 index only scan on functional index logically possible but not executed

Tags:

postgresql

I have read about functional indices and index-only scans in the docs / wiki published by Postgres.

I now have a query like:

SELECT(xpath('/document/uuid/text()', xmldata))[1]::text,  
      (xpath('/document/title/text()', xmldata))[1]::text
FROM xmltable
WHERE(xpath('/document/uuid/text()', xmldata))[1]::text = 'some-uuid-xxxx-xxxx'

and an index:

CREATE INDEX idx_covering_index on xmltable using btree (
    ((xpath('/document/uuid/text()', xmldata))[1]::text),     
    ((xpath('/document/title/text()', xmldata))[1]::text)
)

This index is, looking at it logically, a covering index and should enable an index-only-scan, as all queried values are contained in the index (uuid and title)

I now happen to know, that Postgres only recognizes covering indices on functional indices if the columns used in the function-calls are also contained

eg.:

SELECT to_upper(column1) from table where id >10

1) cannot be covered by this index:

CREATE INDEX idx_covering_index on xmltable using btree (id, to_upper(column1));

2) but can be covered by this one:

CREATE INDEX idx_covering_index on xmltable using btree (column1, id, to_upper(column1));

thus leading to an index-only-scan.

If I now try this with my xml setup:

CREATE INDEX idx_covering_index on xmltable using btree (xmldata,
    ((xpath('/document/uuid/text()', xmldata))[1]::text),     
    ((xpath('/document/title/text()', xmldata))[1]::text)
)

I get an error:

data type xml has no default operator class for access method "btree"

fair enough, unfortunately the normally used "text_ops" or "text_pattern_ops" do not accept "xml" as input - thus rendering my index - although it would cover all values - unable to support index-only-scans.

Can this be handled in a way providing the possibility for index-only-scans?

@EDIT1:

I know that postgres cannot use the index seen at 1) as covering index, but can use an index like 2)

I also tried with very simple tables to verify this behavior, and i also kind of remember to have read this - but i cannot for the life of me remember where.

create table test (
    id serial primary key,
    quote text
)



insert into test (number, quote) values ('I do not know any clever quotes');
insert into test (number, quote) values ('I am sorry');



CREATE INDEX idx_test_functional on test using btree ((regexp_replace(quote, '^I ', 'BillDoor ')));
set enable_seqscan = off;

analyze test;

explain select quote from test where regexp_replace(quote, '^I ', 'BillDoor ') = 'BillDoor do not know any clever quotes'

--> "Index Scan using idx_test_functional on test  (cost=0.13..8.15 rows=1 width=27)"

drop index idx_test_functional;
CREATE INDEX idx_test_functional on test using btree (quote, (regexp_replace(quote, '^I ', 'BillDoor ')));

analyze test;

explain select quote from test where regexp_replace(quote, '^I ', 'BillDoor ') = 'BillDoor do not know any clever quotes'

--> "Index Only Scan using idx_test_functional on test  (cost=0.13..12.17 rows=1 width=27)"

@EDIT2:

Full table definition of xmltable:

id serial primary key (clustered),
xmldata xml (only data used to filter queries)
history xml (never queried or read, just kept in case of legal inquiry)
fileinfo text (seldom quieried, sometimes retrieved)
"timestamp" timestamp (mainly for legal inquiries too)

The Table contains approx.: 500.000 Records, the xmldata is between 350 and 800 bytes in size, history is much larger but seldom retrieved and never used in filters

For the record, to be sure got real results, i always ran analyze xmltable after i created or dropped an index

a full execution plan for the query:

explain analyze select (xpath('/document/uuid/text()', d.xmldata))[1]::text as uuid
from xmltable as d
where
(xpath('/document/uuid/text()', d.xmldata))[1]::text = 'some-uuid-xxxx-xxxx' and (xpath('/document/genre/text()', d.xmldata))[1]::text = 'bio'

covered by these indizies:

create index idx_genre on xmltable using btree (((xpath('/document/genre/text()', xmldata))[1]::text));

create index idx_uuid on xmltable using btree (((xpath('/document/uuid/text()', xmldata))[1]::text)); 

create index idx_uuid_genre on xmltable using btree (((xpath('/document/uuid/text()', xmldata))[1]::text), ((xpath('/document/genre/text()', xmldata))[1]::text));

first leads to:

"Index Scan using idx_genre on xmldata d  (cost=0.42..6303.05 rows=18154 width=32)"
"  Index Cond: (((xpath('/document/genre/text()'::text, xmldata, '{}'::text[]))[1])::text = 'bio'::text)"
"  Filter: (((xpath('/document/uuid/text()'::text, xmldata, '{}'::text[]))[1])::text = 'some-uuid-xxxx-xxxx'::text)"

fair enough i thought, just for testing i'll force it to use the - in my mind - covering index:

drop index idx_uuid;
drop index idx_genre;

and now i get:

"Bitmap Heap Scan on xmltable d  (cost=551.13..16025.51 rows=18216 width=32)"
"  Recheck Cond: ((((xpath('/document/genre/text()'::text, xmldata, '{}'::text[]))[1])::text = 'bio'::text) AND (((xpath('/document/uuid/text()'::text, xmldata, '{}'::text[]))[1])::text = 'some-uuid-xxxx-xxxx'::text))"
"  ->  Bitmap Index Scan on idx_uuid_genre  (cost=0.00..546.58 rows=18216 width=0)"
"        Index Cond: ((((xpath('/document/genre/text()'::text, xmldata, '{}'::text[]))[1])::text = 'bio'::text) AND (((xpath('/document/uuid/text()'::text, xmldata, '{}'::text[]))[1])::text = 'some-uuid-xxxx-xxxx'::text))"

i also tried switching positions of uuid and genre in the index, same execution plan.

like image 720
billdoor Avatar asked Feb 13 '17 14:02

billdoor


People also ask

Why is Postgres not using my index?

As such, if Postgres chooses not to use your index, it is likely that it is calculating the cost of that query plan to be higher than the one it has chosen.

What is index only scan?

An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it's set, the row is known visible and so the data can be returned with no further work.

How do I stop index scan in postgresql?

To prevent Postgres using an index only scan we select more columns than the index contains. The index scan is much faster. The next step is to look at the the query by changing the condition to match all rows. An index scan is still faster, but the percentage of the difference is far smaller.


1 Answers

EDIT FINAL: Why it is impossible

According to documentation: postgresql can do indexonly scans when the index type is supporting this (i.e. btree always supports this, GiST and SpGiST only for some specific operators, and GIN is not capable at all). AND it is possible to reconstruct the original indexed value from the index.

The second requirement is most interesting.

In case of columns it is simple (a, b) and your index is able to reconstruct original stored value.

And in case of functions to make functional index working you should create index having the original values. This means (f1(a), f2(b)) index will come to table once again, because you can't reconstruct indexed data (a, b) from those values. The proposed by developers workaround is to create index (f1(a), f2(b), a, b) in this case the query planner is able to determine that it is possible to run the index-only scan, because index contains original data.

And coming back to your question, to create index only scan on xml column is impossible: there is no operators to support xml data comparison crucial for btree. There is no definition of comparison operators for xml data. and so you can't use this column in any kind of index, but you need it in your index-only scan to hint the query optimizer to perform index-only scan.

EDIT: (solution how to achieve index-only scan on specific xpath expressions)

If you know that those data will be used frequently, I would recommend to resolve this problem via trigger function and creating 2 more fields and cover them by index. Something like that:

ALTER TABLE public.xmltable ADD COLUMN xpath_uuid character varying(36);
ALTER TABLE public.xmltable ADD COLUMN xpath_title character varying(100);


CREATE INDEX idx_covering_materialized_xml_data
  ON public.xmltable
  USING btree
  (xpath_uuid COLLATE pg_catalog."default", xpath_title COLLATE pg_catalog."default");

CREATE OR REPLACE FUNCTION public.introduce_xml_materialization()
  RETURNS trigger AS
$BODY$BEGIN 

NEW.xpath_uuid = (xpath('/document/uuid/text()', NEW.xmldata))[1]::text;
NEW.xpath_title = (xpath('/document/title/text()', NEW.xmldata))[1]::text;

RETURN NEW; 
END;$BODY$
  LANGUAGE plpgsql STABLE
  COST 100;



CREATE TRIGGER index_xml_data
  BEFORE INSERT OR UPDATE
  ON public.xmltable
  FOR EACH ROW
  EXECUTE PROCEDURE public.introduce_xml_materialization();

and then you can do simply:

SELECT  xpath_uuid, xpath_title
  FROM public.xmltable
  where xpath_uuid = ' uuid1 '

which will show you index-only scan :

"Index Only Scan using idx_covering_materialized_xml_data on xmltable  (cost=0.14..8.16 rows=1 width=308)"
"  Index Cond: (xpath_uuid = ' uuid1 '::text)"

This approach would be optimal, assuming that data is read more than written. From the cost of insert or update it is generally same as creating functional index on xpath expressions.

ORIGINAL RESPONSE: (could be interesting for those willing to tweak the query optimizer)

Well the problem is that your query optimizer thinks that xPath function call is simplest. i.e. it is like calling simple math operator and its cost is 1. In this case the query optimizer thinks, that it is easier to fetch from table and calculate once again, then do pure index only scan.

IF you increase the xpath call cost to lets say 1000 the query optimizer will see that such call is significantly harder (that is actually truth) and will try to perform index-only scan. In my test setup I had executed

update pg_proc set procost=1 where proname='xpath';  

and the execution plan is

"Bitmap Heap Scan on xmltable  (cost=4.17..11.30 rows=3 width=64)"
"  Recheck Cond: (((xpath('/document/uuid/text()'::text, xmldata, '{}'::text[]))[1])::text = 'some-uuid-xxxx-xxxx'::text)"
"  ->  Bitmap Index Scan on idx_covering_index_3  (cost=0.00..4.17 rows=3 width=0)"
"        Index Cond: (((xpath('/document/uuid/text()'::text, xmldata, '{}'::text[]))[1])::text = 'some-uuid-xxxx-xxxx'::text)"

But when I do

update pg_proc set procost=1000 where proname='xpath';

Execution plan is switching to index-only scan

"Index Scan using idx_covering_index_3 on xmltable  (cost=0.15..31.20 rows=3 width=64)"
"  Index Cond: (((xpath('/document/uuid/text()'::text, xmldata, '{}'::text[]))[1])::text = 'some-uuid-xxxx-xxxx'::text)"

And on my volume (i.e. no data) the minimal cost of index-only query is significantly smaller than in original index + table scan, howerver the maximum cost is larger. Thus for you to cheat on queryoptimization might be required to place even higher values on xpath call cost.

Hope this will help, And out of curiosity, just show us the gains of using index-only querying.

like image 194
Ilya Dyoshin Avatar answered Sep 29 '22 07:09

Ilya Dyoshin