Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REGEXP performance (compare with "LIKE" and "=")

I am using MySQL. I have asked a question about how to query in database for a single word match here.

  • There is an answer which suggest me to use REGEXP '[[:<:]]word[[:>:]]'

    It is a good answer, however, I am not sure how is this REGEXP '[[:<:]]word[[:>:]]' thing from performance perspective? If I have a large table, is this way harm the performance of my application?

For example, compare with = operation, e.g. WHERE column_name='value', is the REGEXP operation far more slow than = for large table?

  • There is another answer which suggested me to use LIKE, but I think it is not good from performance point of view.

    Then, I googled and found an article which says use LIKE is even faster than REGEXP . I get confused, which way I should use for a single word match query in a large table...

Can I say, = is the fastest operation, then LIKE , and REGEXP is the poorest one from performance perspective?

like image 905
Mellon Avatar asked Nov 16 '11 14:11

Mellon


People also ask

Is REGEXP faster than like?

Also LIKE is much faster than REGEXP.

Is like or Rlike faster?

While neither of these operations (suffix matching) could leverage table indices, the LIKE operator ran 5-6 times faster than the RLIKE / REGEXP operator to find the same data.

Is regex matching slow?

My experience shows that most of the time developers focus on correctness of a regex, leaving aside its performance. Yet matching a string with a regex can be surprisingly slow. So slow it can even stop any JS app or take 100% of a server CPU time causing denial of service (DOS).

What is difference between like and regex?

Basically, LIKE does very simple wildcard matches, and REGEX is capable of very complicated wildcard matches. In fact, regular expressions ( REGEX ) are so capable that they are [1] a whole study in themselves [2] an easy way to introduce very subtle bugs.


1 Answers

Regarding regexp

The regexp can never use an index in MySQL.
The = will use an index if:

  • an index is declared on the column;
  • the values in the column have sufficient cardinality (if more than +/- 20% of the rows match, MySQL will not use an index, because in that case doing a full table scan is faster);
  • No other indexes on the same table are better suited (MySQL can only use one index per table per subselect);

Considering these and some other more esoteric caveats an = comparison is much faster than a regexp.

Regarding like

LIKE can use an index if the wildcard is not the first char.

SELECT * FROM t WHERE a LIKE 'abc'   <<-- (case insensitive `=`) can use an index
SELECT * FROM t WHERE a LIKE 'abc%'  <<-- can use an index
SELECT * FROM t WHERE a LIKE 'a%'    <<-- can use an index, depending on cardinality
SELECT * FROM t WHERE a LIKE '%a%'   <<-- cannot use an index
SELECT * FROM t WHERE a LIKE '_agf'  <<-- cannot use an index

The performance of like when using an index is very close to = (assuming the same number of rows returned).

like image 58
Johan Avatar answered Sep 27 '22 23:09

Johan