A friend gave me this piece of code and said there is a bug. And yes, this code runs for ever.
The answer I got is:
It runs for >10^15 years before printing anything.
public class Match {
public static void main(String[] args) {
Pattern p = Pattern.compile("(aa|aab?)+");
int count = 0;
for(String s = ""; s.length() < 200; s += "a")
if (p.matcher(s).matches())
count++;
System.out.println(count);
}
}
I didn't really understand why am I seeing this behavior, I am new to java, do you have any suggestions?
The pattern you are using is known as an evil regex according to OWASP (they know what they're talking about most of the time):
https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
It basically matches aa
OR aa
or aab
(since the b is optional by addition of ?
)
A Regex like this is vulnerable to a ReDoS or Regex Denial of Service Attack.
So yes, sort out what you want to match. I suggest in the above example you should simply match aa
, no need for groups, repitition or alternation:
Pattern p = Pattern.compile("aa");
Also as someone pointed out, who now deleted his post, you should not use += to append to strings. You should use a StringBuffer instead:
public class Match {
public static void main(String[] args) {
Pattern p = Pattern.compile("aa");
StringBuffer buffy = new StringBuffer(200);
int count = 0;
for(int i = 0; i < 200; i++){
buffy.append("a")
if (p.matcher(buffy.toString()).matches()){
count++;
}
}
System.out.println(count);
}
}
The regular expression (aa|aab?)+
is one that takes an especially long time for the regular expression engine to handle. These are colorfully called evil regexes. It is similar to the (a|aa)+
example at the link. This particular one is very slow on a string composed entirely of a
s.
What this code does is check the evil regex against increasingly long strings of a
s, up to length 200, so it certainly ought to take a long time, and it doesn't print until the loop ends. I'd be interested to know where the 10^15 years figure came from.
OK, the 10^15 (and in fact the entire piece of code in the question) comes from this talk, slide 37. Thanks to zengr for that link. The most relevant piece of information to the question is that the check for this regex takes time that is exponential in the length of the string. Specifically it's O(2^(n/2)), so it takes 2^99 (or so) times longer to check the last string than the first one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With