Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Pattern.matcher() freezes when matching line that contains \n

Tags:

java

regex

I came across a problem I find quite interesting. I'm doing some basic parsing of text files mostly by regex and it always freezes when matching this line

ftrect 0.7031 57.0313 9.8561 55.5313 "FREIGABE \nQ09_SV01"

No exception is thrown; the program just hangs. I'm posting the snippet of the program that recreates the situation; the commented one is possible standard situation, but the other one is the problematic. If you delete the \n it works, but these parsed files comes like this from "blackbox" system.

I surely can make a workaround, I only found it interesting that it actually freezes and hoped that someone could explain what is happening. I tried it on JDK6u22 and JDK7u21...

public static Pattern FTRECT_PATTERN = Pattern.compile(
    "\\s*([\\w]+)?\\:?\\s*ftrect\\s+((\\d*\\.?\\d*\\s?)+)\\s*\"?([\\w\\s\\.\\%\\/\\=]*)?\"?\\s*"
);

public static void main(String[] args) {

//  Matcher m = FTRECT_PATTERN.matcher( "FOX_BACKGROUND: ftrect 46.1719 18.0556 54.8633 16.5556 \"Schicht\" " );
    Matcher m = FTRECT_PATTERN.matcher( "ftrect 0.7031 57.0313 9.8561 55.5313 \"FREIGABE \\nQ09_SV01\"" );
    System.out.println( m.matches() );

    for (int i = 0; i <= m.groupCount(); i++) {
        String string = m.group( i );
        System.out.println( string );
    }
}

Well, I discovered that if I modify the regex to this (adding the \\\\ to the last group):

public static Pattern FTRECT_PATTERN = Pattern.compile(
    "\\s*([\\w]+)?\\:?\\s*ftrect\\s+((\\d*\\.?\\d*\\s?)+)\\s*\"?([\\w\\\\\\s\\.\\%\\/\\=]*)?\"?\\s*"
);

I still don't know why no exception is thrown.

like image 535
Kamil Szabo Avatar asked May 23 '13 15:05

Kamil Szabo


1 Answers

This happens because of catastrophic backtracking. Your test string contains a literal backslash (in "...\\n...") that isn't matched by the character class [\w\s\.\%\/\=]*.

This means that the regex engine has to try all possible permutations of the string "FREIGABE that precedes the offending character before being able to decide on a non-match.

That's a very high number, keeping the engine busy for a few hours, probably. Once you added the backslash to the character class, the regex can match.

Prevention: Use possessive quantifiers (*+ and ++) to avoid unhelpful backtracking:

public static Pattern FTRECT_PATTERN = Pattern.compile( "\\s*([\\w]+)?\\:?\\s*ftrect\\s+((\\d*\\.?\\d*\\s?)++)\\s*\"?([\\w\\s\\.\\%\\/\\=]*+)?\"?\\s*" );

A better, cleaned-up solution would be:

public static Pattern FTRECT_PATTERN = Pattern.compile("\\s*(\\w*):?\\s*ftrect\\s+((\\b\\d*(?:\\.\\d+)?\\b\\s?)+)\\s*\"?([\\\\\\w\\s.%/=]*+)?\"?\\s*");
like image 144
Tim Pietzcker Avatar answered Sep 18 '22 12:09

Tim Pietzcker