We have a string String s = "first.second.third...n-1.n";
Which of the two regex approaches are more efficient in Java?
s = s.replaceFirst(".*?\\.", "");
or
s = s.replaceAll('^[^.]+[.]', '');
They do the same thing, but I wonder which one is faster?
The differences are:
using anchored regex vs. replaceFirst()
to only match the first instance
using non-greedy *?
vs a non-dot character class [^.]
using \\.
literal vs. [.]
character class.
I would prefer an answer which would benchmark or explain the performance effect of those separately.
The second regexp is more efficient, because it does not backtrack.
Here is a link to a nice article explaining the details. The article explains how the expression
<.*?>
takes 25 steps, while the expression
<[^>]*>
takes only five to find a match in a <0123456789>
string, illustrating each of the steps the regex engine needs to take in order to produce a match.
There should be no difference between \\.
and [.]
- good regexp engines will convert both sub-expressions to the same compiled expression.
The anchored version with replaceAll
is not doing the same thing as non-anchored replaceFirst
, because the anchored version will not find a match when a dot .
is the first character in the string. You can fix this by replacing +
with a *
.
With this difference out of the way, replaceAll
will spend a little more time checking that there are no other matches (and there wouldn't be, because your expression is anchored), but that would not be significant for strings with long initial runs containing no dots.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With