I tried to recreate regular expression denial of service attack using (a+)+ regexp and aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa! (with large amounts of a) input using jshell:
Pattern.compile("(a+)+")
    .matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")
    .matches()
But this completes pretty quickly each time I tried. Is the regexp implementation in Java different from others? Or the linked wikipedia page is wrong?
(BTW. I'm using Java 11, if that's relevant)
EDIT: Looks like it is Java version related, when I tried it on Java 8, it hangs, but in Java 9 and 11 it works right away. What did change between those versions that could affect that? Are all regex safe now in Java?
Is there a specific Java JEP that changed the regexp implementation? I would like to know what kind of regexps are still a problem for newer Java.
According to the article RSPEC-2631, the ReDoS issue has been handled in Java 9 and later:
Java runtimes like OpenJDK 9+ are mitigating this problem by having additional protections in their implementation of regular expression evaluation. In those runtime the example above is not vulnerable.
I am currently running Java 8 and the following code hangs:
Pattern.compile("(a|aa)+")
       .matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab")
       .matches()
Seeing as how you are using Java 11 (and have also tested it with Java 9/10) and have seen it take a small amount of time to complete, there was obviously a change that was made between those versions.
From looking at the source code of Matcher in Java 11, we find the following addition that isn't present in Java 8:
/**
 * Storage used by top greedy Loop node to store a specific hash set to
 * keep the beginning index of the failed repetition match. The nodes
 * themselves are stateless, so they rely on this field to hold state
 * during a match.
 */
IntHashSet[] localsPos;
This local storage, along with a large amount of other code added, seems to be one of the main reasons why the state machine for regular expressions in Java 9+ completes much faster than in Java 8 and below.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With