I came across a problem I find quite interesting. I'm doing some basic parsing of text files mostly by regex and it always freezes when matching this line
ftrect 0.7031 57.0313 9.8561 55.5313 "FREIGABE \nQ09_SV01"
No exception is thrown; the program just hangs. I'm posting the snippet of the program that recreates the situation; the commented one is possible standard situation, but the other one is the problematic. If you delete the \n it works, but these parsed files comes like this from "blackbox" system.
I surely can make a workaround, I only found it interesting that it actually freezes and hoped that someone could explain what is happening. I tried it on JDK6u22 and JDK7u21...
public static Pattern FTRECT_PATTERN = Pattern.compile(
"\\s*([\\w]+)?\\:?\\s*ftrect\\s+((\\d*\\.?\\d*\\s?)+)\\s*\"?([\\w\\s\\.\\%\\/\\=]*)?\"?\\s*"
);
public static void main(String[] args) {
// Matcher m = FTRECT_PATTERN.matcher( "FOX_BACKGROUND: ftrect 46.1719 18.0556 54.8633 16.5556 \"Schicht\" " );
Matcher m = FTRECT_PATTERN.matcher( "ftrect 0.7031 57.0313 9.8561 55.5313 \"FREIGABE \\nQ09_SV01\"" );
System.out.println( m.matches() );
for (int i = 0; i <= m.groupCount(); i++) {
String string = m.group( i );
System.out.println( string );
}
}
Well, I discovered that if I modify the regex to this (adding the \\\\
to the last group):
public static Pattern FTRECT_PATTERN = Pattern.compile(
"\\s*([\\w]+)?\\:?\\s*ftrect\\s+((\\d*\\.?\\d*\\s?)+)\\s*\"?([\\w\\\\\\s\\.\\%\\/\\=]*)?\"?\\s*"
);
I still don't know why no exception is thrown.
This happens because of catastrophic backtracking. Your test string contains a literal backslash (in "...\\n..."
) that isn't matched by the character class [\w\s\.\%\/\=]*
.
This means that the regex engine has to try all possible permutations of the string "FREIGABE
that precedes the offending character before being able to decide on a non-match.
That's a very high number, keeping the engine busy for a few hours, probably. Once you added the backslash to the character class, the regex can match.
Prevention: Use possessive quantifiers (*+
and ++
) to avoid unhelpful backtracking:
public static Pattern FTRECT_PATTERN = Pattern.compile( "\\s*([\\w]+)?\\:?\\s*ftrect\\s+((\\d*\\.?\\d*\\s?)++)\\s*\"?([\\w\\s\\.\\%\\/\\=]*+)?\"?\\s*" );
A better, cleaned-up solution would be:
public static Pattern FTRECT_PATTERN = Pattern.compile("\\s*(\\w*):?\\s*ftrect\\s+((\\b\\d*(?:\\.\\d+)?\\b\\s?)+)\\s*\"?([\\\\\\w\\s.%/=]*+)?\"?\\s*");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With