Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is more efficient: replaceFirst(), or replaceAll() with an anchored regex?

We have a string String s = "first.second.third...n-1.n";

Which of the two regex approaches are more efficient in Java?

s = s.replaceFirst(".*?\\.", "");

or

s = s.replaceAll('^[^.]+[.]', '');

They do the same thing, but I wonder which one is faster?

The differences are:

  1. using anchored regex vs. replaceFirst() to only match the first instance

  2. using non-greedy *? vs a non-dot character class [^.]

  3. using \\. literal vs. [.] character class.

I would prefer an answer which would benchmark or explain the performance effect of those separately.

like image 286
DVK Avatar asked Oct 07 '22 18:10

DVK


1 Answers

The second regexp is more efficient, because it does not backtrack.

Here is a link to a nice article explaining the details. The article explains how the expression

<.*?>

takes 25 steps, while the expression

<[^>]*>

takes only five to find a match in a <0123456789> string, illustrating each of the steps the regex engine needs to take in order to produce a match.

There should be no difference between \\. and [.] - good regexp engines will convert both sub-expressions to the same compiled expression.

The anchored version with replaceAll is not doing the same thing as non-anchored replaceFirst, because the anchored version will not find a match when a dot . is the first character in the string. You can fix this by replacing + with a *.

With this difference out of the way, replaceAll will spend a little more time checking that there are no other matches (and there wouldn't be, because your expression is anchored), but that would not be significant for strings with long initial runs containing no dots.

like image 192
Sergey Kalinichenko Avatar answered Oct 10 '22 02:10

Sergey Kalinichenko