Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unaccent() preventing index usage in Postgres

I want to retrieve a way with a given name from an OpenStreetMap database imported into PostgreSQL 9.3.5, the OS is Win7 64-bit. In order to be a bit failure tolerant, I use the unaccent extension of Postgres.

My query looks as follows:

SELECT * FROM germany.ways
WHERE lower(tags->'name') like lower(unaccent('unaccent','Weststrasse'))

Query plan:

Seq Scan on ways  (cost=0.00..2958579.31 rows=122 width=465)
  Filter: (lower((tags -> 'name'::text)) ~~ lower(unaccent('unaccent'::regdictionary, 'Weststrasse'::text)))

The strange thing is that this query uses a sequential scan on ways, although an index is present on lower(tags->'name'):

CREATE INDEX ways_tags_name ON germany.ways (lower(tags -> 'name'));

Postgres uses the index as soon as I remove unaccent from the query:

SELECT * FROM germany.ways
WHERE lower(tags->'name') like lower('Weststrasse')

Query plan:

Index Scan using ways_tags_name on ways  (cost=0.57..495.43 rows=122 width=465)
  Index Cond: (lower((tags -> 'name'::text)) = 'weststrasse'::text)
  Filter: (lower((tags -> 'name'::text)) ~~ 'weststrasse'::text)

Why is unaccent preventing Postgres from using the index? In my opinion this doesn't make sense because the result of unaccent (diacritics removal, etc.) should already be completely known before the actual query is executed. So Postgres should be able to use the index. How can the seq scan be avoided when using unaccent?

like image 295
Alf Kortig Avatar asked Mar 06 '15 12:03

Alf Kortig


1 Answers

IMMUTABLE variant of unaccent()

To clarify the misinformation in the currently accepted, incorrect answer:
Expression indexes only allow IMMUTABLE functions (for obvious reasons) and unaccent() is only STABLE. The solution you suggested in the the comment is also problematic. Detailed explanation and a proper solution for that:

  • Does PostgreSQL support "accent insensitive" collations?

Depending on the content of tags->name it may be useful to add unaccent() to the expression index, but that's orthogonal to the question why the index wasn't being used:

  • PostgreSQL accent + case insensitive search

Actual problem / solution

The operator LIKE in your query is subtly wrong (most likely). You do not want to interpret 'Weststrasse' as search pattern, you want to match the (normalized) string as is. Replace with the = operator, and you will see a (bitmap) index scan with your current index, irregardless of the function volatility of unaccent():

SELECT * FROM germany.ways
WHERE lower(tags->'name') = lower(unaccent('unaccent','Weststrasse'))

Why?

The right operand of LIKE is a pattern. Postgres cannot use a plain btree index for pattern matching (exceptions apply). A LIKE with a plain string as pattern (no special characters) can be optimized with an equality check on the btree index. But if there are special characters in the string, this index is out.

If there is an IMMUTABLE function to the right of LIKE, it can be evaluated immediately and the said optimisation is still possible. Per documentation on Function Volatility Categories:

IMMUTABLE ...
This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments.

The same is not possible with a lesser function volatility (STABLE or VOLATILE). That's why your "solution" of faking an IMMUTABLE unaccent() seemed to work, but it's really putting lipstick on a pig.

To reiterate:

  • If you want to work with LIKE and patterns, use a trigram index.
  • If you don't want to work with LIKE and patterns, use the equality operator =
like image 186
Erwin Brandstetter Avatar answered Sep 28 '22 06:09

Erwin Brandstetter