I tried to recreate regular expression denial of service attack using (a+)+
regexp and aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
(with large amounts of a
) input using jshell:
Pattern.compile("(a+)+")
.matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")
.matches()
But this completes pretty quickly each time I tried. Is the regexp implementation in Java different from others? Or the linked wikipedia page is wrong?
(BTW. I'm using Java 11, if that's relevant)
EDIT: Looks like it is Java version related, when I tried it on Java 8, it hangs, but in Java 9 and 11 it works right away. What did change between those versions that could affect that? Are all regex safe now in Java?
Is there a specific Java JEP that changed the regexp implementation? I would like to know what kind of regexps are still a problem for newer Java.
According to the article RSPEC-2631, the ReDoS issue has been handled in Java 9 and later:
Java runtimes like OpenJDK 9+ are mitigating this problem by having additional protections in their implementation of regular expression evaluation. In those runtime the example above is not vulnerable.
I am currently running Java 8 and the following code hangs:
Pattern.compile("(a|aa)+")
.matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab")
.matches()
Seeing as how you are using Java 11 (and have also tested it with Java 9/10) and have seen it take a small amount of time to complete, there was obviously a change that was made between those versions.
From looking at the source code of Matcher
in Java 11, we find the following addition that isn't present in Java 8:
/**
* Storage used by top greedy Loop node to store a specific hash set to
* keep the beginning index of the failed repetition match. The nodes
* themselves are stateless, so they rely on this field to hold state
* during a match.
*/
IntHashSet[] localsPos;
This local storage, along with a large amount of other code added, seems to be one of the main reasons why the state machine for regular expressions in Java 9+ completes much faster than in Java 8 and below.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With