I appreciate that LIKE queries are slow as they cannot be indexed. However, I am curious about the performance hit in a situation like this:
Say I have a table like:
user_id | message
-------------------
1 | foo bar baz
1 | bar buz qux
. . .
. . .
2 | bux bar foo
2 | bar
where I have say 1 million rows, but 10,000 users, so each user has about 100 messages.
Clearly a search like:
SELECT * FROM table WHERE message like '%ar%';
is going to be very slow. However in my application I would only ever search a user's messages:
SELECT * FROM table WHERE message like '%ar%' AND user_id = 2;
where the user_id
column would be indexed.
Am I right in thinking that in a scenario like this, Postgres would only ever perform the slow LIKE query on the users ~100 rows, after using the indexed user_id column, rather than the full table - thus limiting my performance hit?
And also that a query like this wouldn't get significantly slower with 10 or 100 million users, as long as any one user only had ~100 messages?
@MatBailie already cleared your primary question. I want to address an assertion of yours:
I appreciate that LIKE queries are slow as they cannot be indexed.
This is not entirely true.
Firstly, and this has been true for a long time, left anchored patterns can use an index. This works for regular expressions (~
) as well as LIKE
(~~
) and SIMILAR TO
. I recently wrote a comprehensive review on the matter on dba.SE:
This may not work for you, because the patterns in your question are not anchored. If they were you could get optimized performance with a multicolumn index that uses the text pattern operator class text_pattern_ops
for the message
column like this:
CREATE INDEX tbl_user_id_message_idx ON tbl (user_id, message text_pattern_ops);
For queries like:
SELECT *
FROM tbl
WHERE user_id = 2
AND message ~~ 'bar%'; -- left anchored LIKE
Secondly, since PostgreSQL 9.1 you can use the pg_trgm
extension and create a GIST or GIN index with it, that all patterns can use. Some limitations apply. Maintenance of such an index is more expensive, so it is most useful for read-only or rarely written tables. Details:
Depesz has a related tutorial.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With