Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL performance difference between LIKE and regex

Could someone explain such a big performance difference between these SQLs ?

SELECT count(*) as cnt FROM table WHERE name ~ '\*{3}'; -- Total runtime 12.000 - 18.000 ms
SELECT count(*) as cnt FROM table WHERE name ~ '\*\*\*'; -- Total runtime 12.000 - 18.000 ms
SELECT count(*) as cnt FROM table WHERE name LIKE '%***%'; -- Total runtime 5.000 - 7.000 ms

As you can see, the difference is more than double between LIKE operator and simple regular expression (I thought LIKE operator internally would be converted into the regular expression and there shouldn't be any difference)

There are almost 13000 rows there and the column "name" is of "text" type. There are no indexes related to the "name" column defined in the table.

EDIT:

EXPLAIN ANALYZE OF EACH OF THEM:

EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*{3}';

Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=18.279..18.280 rows=1 loops=1)
  ->  Seq Scan on datos (cost=0.00..894.31 rows=1 width=0) (actual time=0.620..18.266 rows=25 loops=1)
        Filter: (nombre ~ '\*{3}'::text)
Total runtime: 18.327 ms

EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*\*\*';
Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=17.404..17.405 rows=1 loops=1)
  ->  Seq Scan on datos  (cost=0.00..894.31 rows=1 width=0) (actual time=0.608..17.396 rows=25 loops=1)
        Filter: (nombre ~ '\*\*\*'::text)
Total runtime: 17.451 ms

EXPLAIN ANALYZE SELECT count(*) as cnt  FROM datos WHERE nombre LIKE '%***%';
Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=4.258..4.258 rows=1 loops=1)
  ->  Seq Scan on datos  (cost=0.00..894.31 rows=1 width=0) (actual time=0.138..4.249 rows=25 loops=1)
        Filter: (nombre ~~ '%***%'::text)
Total runtime: 4.295 ms
like image 253
dmikam Avatar asked Apr 06 '15 08:04

dmikam


People also ask

Is regex faster than like?

Yeah, it probably would be a tiny bit faster because standard-SQL LIKE is a simpler comparison operation than a full-on regex parser. However, in real terms both are really slow, because neither can use indices.

Is like or Rlike faster?

-- When using the RLIKE operator, three attempts ran in: 144s, 141s, 140s. While neither of these operations (suffix matching) could leverage table indices, the LIKE operator ran 5-6 times faster than the RLIKE / REGEXP operator to find the same data.

Is Ilike slower than like?

The bare expression lower(description) LIKE '%abc%' is typically a bit faster than description ILIKE '%abc%' , and either is a bit faster than the equivalent regular expression: description ~* 'abc' .

What is the difference between the like and regexp operators?

Basically, LIKE does very simple wildcard matches, and REGEX is capable of very complicated wildcard matches. In fact, regular expressions ( REGEX ) are so capable that they are [1] a whole study in themselves [2] an easy way to introduce very subtle bugs.


1 Answers

The text LIKE text operator (~~) is implemented by specific C code in like_match.c. It's ad-hoc code that is completely independent from regular expressions. Looking at the comments, it's obviously specially optimized to implement only % and _ as wildcards, and short-circuiting to an exit whenever possible, whereas a regular expression engine is more complex by several orders of magnitude.

Note that in your test case , just like the regexp is suboptimal compared to LIKE, LIKE is probably suboptimal compared to strpos(name, '***') > 0

strpos is implemented with the Boyer–Moore–Horspool algorithm which is optimized for large substrings with few partial matches in the searched text.

Internally these functions are reasonably optimized but when there are several methods to the same goal, choosing the likely best is still the job of the caller. PostgreSQL will not analyze for us the pattern to match and switch a regexp into a LIKE or a LIKE into a strpos based on that analysis.

like image 58
Daniel Vérité Avatar answered Oct 28 '22 17:10

Daniel Vérité