Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create an index on an XML column in PostgreSQL with an xpath expression?

I'm running into this error when attempting to create a btree index on an XML data type column that uses an xpath expression on AuroraDB - PostgreSQL 9.6:

ERROR:  could not identify a comparison function for type xml
SQL state: 42883

This 2009 thread without a clear resolution is the only one I've found discussing this error message in regards to creating an xpath based index for a much earlier version of PostgreSQL: https://www.postgresql-archive.org/Slow-select-times-on-select-with-xpath-td2074839.html

In my case I do also need to specify namespaces as well and the original poster in that thread cast the result of the xpath expression to text[] which does get by the error for me too - but why is that even needed? I also don't see PostgreSQL ever using my index even when I have thousands of rows to go through.

So I tried out a simpler case and the error still occurs - please shed some light as to why if you could:

CREATE TABLE test
(
    id integer NOT NULL,
    xml_data xml NOT NULL,
    CONSTRAINT test_pkey PRIMARY KEY (id)
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;



CREATE INDEX test_idx
    ON test USING btree 
    (xpath('/book/title', xml_data))

and the resulting message is:

ERROR:  could not identify a comparison function for type xml
SQL state: 42883

The database encoding is UTF8. The Collation and Character Type are en_US.UTF-8.

Some sample insert statements too:

insert into source_data.test(id, xml_data) 
values(1, XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>1</chapter><chapter>2</chapter></book>'))

insert into source_data.test(id, xml_data) 
values(2, XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Apropos</title><chapter>1</chapter><chapter>2</chapter></book>'))
like image 697
Khorkrak Avatar asked Sep 04 '18 20:09

Khorkrak


1 Answers

Your getting this error because the XML data type does not provide any comparison operators, hence you can't create an index on the result of xpath(), because it returns an array of XML values.

Therefore you need to cast the XPath expression to a text array when creating the index:

CREATE INDEX test_idx
ON test USING BTREE 
    (cast(xpath('/book/title', xml_data) as text[])) ;

This index then gets used when querying the table:

EXPLAIN ANALYZE
SELECT * FROM test where
cast(xpath('/book/title', xml_data) as text[]) = '{<title>Apropos</title>}';

gives

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on test  (cost=0.13..8.15 rows=1 width=36) (actual time=0.034..0.038 rows=1 loops=1)
    Index Cond: ((xpath('/book/title'::text, xml_data, '{}'::text[]))::text[] = '{<title>Apropos</title>}'::text[])
Planning time: 0.168 ms
Execution time: 0.073 ms (4 rows)

This works the same when using text():

CREATE INDEX test_idx
ON test USING BTREE 
    (cast(xpath('/book/title/text()', xml_data) as text[])) ;

explain analyze select * from test where
cast(xpath('/book/title/text()', xml_data) as text[]) = '{Apropos}';

gives

                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Index Scan using test_idx on test  (cost=0.13..8.15 rows=1 width=36) (actual time=0.034..0.038 rows=1 loops=1)
   Index Cond: ((xpath('/book/title/text()'::text, xml_data, '{}'::text[]))::text[] = '{Apropos}'::text[])
 Planning time: 0.166 ms
 Execution time: 0.076 ms
(4 rows)

Note that I forced the use of the index via the following command, as I only had 4 rows in the test table I created.

SET enable_seqscan TO off;
like image 172
Markus Avatar answered Nov 08 '22 06:11

Markus