Which of these is more performant, or (if equivalent) which one reads better? I'm trying to match everything inside a pair of parentheses.
Pattern p1 = Pattern.compile("\\([^)]*\\)");
Pattern p2 = Pattern.compile("\\(.*?\\)");
To me, the second reads better but uses the possibly confusing reluctant quantifier, and I'm unsure if that causes a performance loss.
EDIT
Don't miss the answer which shows this is even better:
Pattern p3 = Pattern.compile("\\([^)]*+\\)");
\([^)]*\)
will be faster, albeit not noticeable if the input is small. A better gain is likely to occur when you make [^)]*
possessive: [^)]*+
. That way, the regex engine will not keep track of all the chars [^)]*
matches in case it needs to backtrack (which is not going to happen in case of [^)]*\)
). Making a pattern possessive causes the regex engine to not remember the chars this pattern has matched.
Again, this might not be noticeable, but if your input gets large(r), I'm pretty sure* the difference between .*?
and [^)]*
is less than that between [^)]*
and [^)]*+
.
* run some benchmarks to be sure!
This one has better performance compared to the p2
, the non-greedy way, which will cause backtracking.
Pattern p1 = Pattern.compile("\\([^)]*\\)");
Look at this article.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With