Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Regex Performance: Reluctant Quantifier or Character Class?

Tags:

java

regex

Which of these is more performant, or (if equivalent) which one reads better? I'm trying to match everything inside a pair of parentheses.

Pattern p1 = Pattern.compile("\\([^)]*\\)");
Pattern p2 = Pattern.compile("\\(.*?\\)");

To me, the second reads better but uses the possibly confusing reluctant quantifier, and I'm unsure if that causes a performance loss.

EDIT

Don't miss the answer which shows this is even better:

Pattern p3 = Pattern.compile("\\([^)]*+\\)");
like image 265
Cory Kendall Avatar asked Oct 05 '12 08:10

Cory Kendall


2 Answers

\([^)]*\) will be faster, albeit not noticeable if the input is small. A better gain is likely to occur when you make [^)]* possessive: [^)]*+. That way, the regex engine will not keep track of all the chars [^)]* matches in case it needs to backtrack (which is not going to happen in case of [^)]*\)). Making a pattern possessive causes the regex engine to not remember the chars this pattern has matched.

Again, this might not be noticeable, but if your input gets large(r), I'm pretty sure* the difference between .*? and [^)]* is less than that between [^)]* and [^)]*+.

* run some benchmarks to be sure!

like image 78
Bart Kiers Avatar answered Nov 15 '22 00:11

Bart Kiers


This one has better performance compared to the p2, the non-greedy way, which will cause backtracking.

Pattern p1 = Pattern.compile("\\([^)]*\\)");

Look at this article.

like image 41
xdazz Avatar answered Nov 14 '22 23:11

xdazz