Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Java ReDos vulnerable?

I tried to recreate regular expression denial of service attack using (a+)+ regexp and aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa! (with large amounts of a) input using jshell:

Pattern.compile("(a+)+")
    .matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")
    .matches()

But this completes pretty quickly each time I tried. Is the regexp implementation in Java different from others? Or the linked wikipedia page is wrong?

(BTW. I'm using Java 11, if that's relevant)

EDIT: Looks like it is Java version related, when I tried it on Java 8, it hangs, but in Java 9 and 11 it works right away. What did change between those versions that could affect that? Are all regex safe now in Java?

Is there a specific Java JEP that changed the regexp implementation? I would like to know what kind of regexps are still a problem for newer Java.

like image 654
Krzysztof Krasoń Avatar asked Oct 29 '18 15:10

Krzysztof Krasoń


2 Answers

According to the article RSPEC-2631, the ReDoS issue has been handled in Java 9 and later:

Java runtimes like OpenJDK 9+ are mitigating this problem by having additional protections in their implementation of regular expression evaluation. In those runtime the example above is not vulnerable.

like image 64
Happy Avatar answered Nov 02 '22 00:11

Happy


I am currently running Java 8 and the following code hangs:

Pattern.compile("(a|aa)+")
       .matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab")
       .matches()

Seeing as how you are using Java 11 (and have also tested it with Java 9/10) and have seen it take a small amount of time to complete, there was obviously a change that was made between those versions.

From looking at the source code of Matcher in Java 11, we find the following addition that isn't present in Java 8:

/**
 * Storage used by top greedy Loop node to store a specific hash set to
 * keep the beginning index of the failed repetition match. The nodes
 * themselves are stateless, so they rely on this field to hold state
 * during a match.
 */
IntHashSet[] localsPos;

This local storage, along with a large amount of other code added, seems to be one of the main reasons why the state machine for regular expressions in Java 9+ completes much faster than in Java 8 and below.

like image 22
Jacob G. Avatar answered Nov 02 '22 02:11

Jacob G.