Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High performance simple Java regular expressions

Tags:

java

regex

Part of the code I'm working on uses a bunch of regular expressions to search for some simple string patterns (e.g., patterns like "foo[0-9]{3,4} bar"). Currently, we use statically-compiled Java Patterns and then call Pattern#matcher to check whether a string has contains a match to the pattern (I don't need the match, just a boolean indicating whether there is a match). This is causing a noticeable amount of memory allocation that is affecting performance.

Is there a better option for Java regex matching that is faster or at least doesn't allocate memory every time it searches a string for a pattern?

like image 814
jonderry Avatar asked Sep 21 '11 18:09

jonderry


People also ask

Is regex faster in Java?

Regex definitely performs better than String based operations. Java regex engine uses efficient algorithms for finding matches, whereas String.

How do I make regular expressions faster?

Expose Literal Characters Regex engines match fastest when anchors and literal characters are right there in the main pattern, rather than buried in sub-expressions. Hence the advice to "expose" literal characters whenever you can take them out of an alternation or quantified expression. Let's look at two examples.

Is there anything faster than regex?

String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way. Regular expressions have to be parsed, and code generated to perform the operation using string operations.

Is regex faster than for loop Java?

Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement.


2 Answers

Try matcher.reset("newinputtext") method to avoid creating new matchers each time you are calling Pattern.matcher.

like image 134
Narendra Yadala Avatar answered Oct 19 '22 03:10

Narendra Yadala


If you want to avoid creating a new Matcher for each Pattern, use the usePattern() method, like so:

Pattern[] pats = {
  Pattern.compile("123"),
  Pattern.compile("abc"),
  Pattern.compile("foo")
};
String s = "123 abc";
Matcher m = Pattern.compile("dummy").matcher(s);
for (Pattern p : pats)
{
  System.out.printf("%s : %b%n", p.pattern(), m.reset().usePattern(p).find());
}

see the demo on Ideone

You have to use matcher's reset() method too, or find() will only search from the point where the previous match ended (assuming the match was successful).

like image 34
Alan Moore Avatar answered Oct 19 '22 04:10

Alan Moore