Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bug in java.util.regex in sun jdk 6.0.24?

Tags:

java

regex

The following code blocks on my system. Why?

System.out.println( Pattern.compile( 
   "^((?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*)/\\*.*?\\*/(.*)$", 
   Pattern.MULTILINE | Pattern.DOTALL ).matcher( 
   "\n\n\n\n\n\nUPDATE \"$SCHEMA\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';"
   ).matches() );

The pattern (designed to detect comments of the form /*...*/ but not within ' or ") should be fast, as it is deterministic... Why does it take soooo long?

like image 875
Steffen Heil Avatar asked Jan 19 '23 17:01

Steffen Heil


2 Answers

You're running into catastrophic backtracking.

Looking at your regex, it's easy to see how .*? and (.*) can match the same content since both also can match the intervening \*/ part (dot matches all, remember). Plus (and even more problematic), they can also match the same stuff that ((?:[^'"][^'"]*|"[^"]*"|'[^']*')*) matches.

The regex engine gets bogged down in trying all the permutations, especially if the string you're testing against is long.

I've just checked your regex against your string in RegexBuddy. It aborts the match attempt after 1.000.000 steps of the regex engine. Java will keep churning on until it gets through all permutations or until a Stack Overflow occurs...

You can greatly improve the performance of your regex by prohibiting backtracking into stuff that has already been matched. You can use atomic groups for this, changing your regex into

^((?>[^'"]+|"[^"]*"|'[^']*')*)(?>/\*.*?\*/)(.*)$

or, as a Java string:

"^((?>[^'\"]+|\"[^\"]*\"|'[^']*')*)(?>/\\*.*?\\*/)(.*)$"

This reduces the number of steps the regex engine has to go through from > 1 million to 58.

Be advised though that this will only find the first occurrence of a comment, so you'll have to apply the regex repeatedly until it fails.

Edit: I just added two slashes that were important for the expressions to work. Yet I had to change more than 6 characters.... :(

like image 176
Tim Pietzcker Avatar answered Jan 28 '23 17:01

Tim Pietzcker


I recommend that you read Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...).

like image 30
NPE Avatar answered Jan 28 '23 15:01

NPE