Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance implication of LIKE query when operating on a subset of full table

I appreciate that LIKE queries are slow as they cannot be indexed. However, I am curious about the performance hit in a situation like this:

Say I have a table like:

user_id  |  message 
-------------------
   1     |  foo bar baz
   1     |  bar buz qux
   .     .      .
   .     .      .
   2     |  bux bar foo
   2     |  bar

where I have say 1 million rows, but 10,000 users, so each user has about 100 messages.

Clearly a search like:

SELECT * FROM table WHERE message like '%ar%';

is going to be very slow. However in my application I would only ever search a user's messages:

SELECT * FROM table WHERE message like '%ar%' AND user_id = 2;

where the user_id column would be indexed.

Am I right in thinking that in a scenario like this, Postgres would only ever perform the slow LIKE query on the users ~100 rows, after using the indexed user_id column, rather than the full table - thus limiting my performance hit?

And also that a query like this wouldn't get significantly slower with 10 or 100 million users, as long as any one user only had ~100 messages?

like image 315
latentflip Avatar asked Dec 21 '22 00:12

latentflip


1 Answers

@MatBailie already cleared your primary question. I want to address an assertion of yours:

I appreciate that LIKE queries are slow as they cannot be indexed.

This is not entirely true.

Firstly, and this has been true for a long time, left anchored patterns can use an index. This works for regular expressions (~) as well as LIKE (~~) and SIMILAR TO. I recently wrote a comprehensive review on the matter on dba.SE:

  • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

This may not work for you, because the patterns in your question are not anchored. If they were you could get optimized performance with a multicolumn index that uses the text pattern operator class text_pattern_ops for the message column like this:

CREATE INDEX tbl_user_id_message_idx ON tbl (user_id, message text_pattern_ops);

For queries like:

SELECT *
FROM   tbl
WHERE  user_id = 2
AND    message ~~ 'bar%'; -- left anchored LIKE

Secondly, since PostgreSQL 9.1 you can use the pg_trgm extension and create a GIST or GIN index with it, that all patterns can use. Some limitations apply. Maintenance of such an index is more expensive, so it is most useful for read-only or rarely written tables. Details:

  • PostgreSQL LIKE query performance variations

Depesz has a related tutorial.

like image 51
Erwin Brandstetter Avatar answered Dec 25 '22 22:12

Erwin Brandstetter