Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression hangs program (100% CPU usage)

Tags:

java

regex

Java is hanging with 100% CPU usage when I use the below string as input for a regular expression.

RegEx Used:

Here is the regular expression used for the description field in my application.

^([A-Za-z0-9\\-\\_\\.\\&\\,]+[\\s]*)+ 

String used for testing:

SaaS Service VLAN from Provider_One
2nd attempt with Didier SPT because the first one he gave me was wrong :-(

It works properly when I split the same string in different combinations. Like "SaaS Service VLAN from Provider_One", "first one he gave me was wrong :-(", etc. Java is hanging only for the above given string.

I also tried optimizing the regex as below.

^([\\w\\-\\.\\&\\,]+[\\s]*)+ 

Even with this is not working.

like image 516
user1531484 Avatar asked Jul 17 '12 12:07

user1531484


1 Answers

Another classic case of catastrophic backtracking.

You have nested quantifiers that cause a gigantic number of permutations to be checked when the regex arrives at the : in your input string which is not part of your character class (assuming you're using the .matches() method).

Let's simplify the problem to this regex:

^([^:]+)+$ 

and this string:

1234: 

The regex engine needs to check

1234    # no repetition of the capturing group 123 4   # first repetition of the group: 123; second repetition: 4 12 34   # etc. 12 3 4  1 234 1 23 4 1 2 34 1 2 3 4 

...and that's just for four characters. On your sample string, RegexBuddy aborts after 1 million attempts. Java will happily keep on chugging... before finally admitting that none of these combinations allows the following : to match.

How can you solve this?

You can forbid the regex from backtracking by using possessive quantifiers:

^([A-Za-z0-9_.&,-]++\\s*+)+ 

will allow the regex to fail faster. Incidentally, I removed all those unnecessary backslashes.

Edit:

A few measurements:

On the string "was wrong :-)", it takes RegexBuddy 862 steps to figure out a non-match.
For "me was wrong :-)", it's 1,742 steps.
For "gave me was wrong :-)", 14,014 steps.
For "he gave me was wrong :-)", 28,046 steps.
For "one he gave me was wrong :-)", 112,222 steps.
For "first one he gave me was wrong :-)", >1,000,000 steps.

like image 149
Tim Pietzcker Avatar answered Sep 17 '22 21:09

Tim Pietzcker