Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java : Matcher.find using high cpu utilization

I am using mod security rules https://github.com/SpiderLabs/owasp-modsecurity-crs to sanitize user input data. I am facing cpu shoot up and delay in matching the user input with mod security rule regular expressions. Overall it contains 500+ regular expression to check different types of attacks(xss , badrobots , generic and sql). For each request , I go through all parameters and check against all these 500 regular expressions. I am using Matcher.find to check the parameters. In this case some parameters fall in infinite looping , I tackled this using below technique.

Cancelling a long running regex match?.

Sanitize a user request took around ~500 ms and cpu % shoots up. I analyzed using visualvm.java.net with my test suite runner.

Cpu Profile Output

enter image description here

Please help me to reduce the cpu usage % and load average?

like image 624
kannanrbk Avatar asked Aug 23 '13 17:08

kannanrbk


2 Answers

If possible, compile your regexes once and keep them around, rather than repeatedly (implicitly) compiling (especially inside a loop).
See java.util.regex - importance of Pattern.compile()? for more info.

like image 90
Edward Avatar answered Oct 06 '22 04:10

Edward


I suggest you look at this paper: "Towards Faster String Matching for Intrusion Detection or Exceeding the Speed of Snort"

There are better ways to do the matching you describe. Essentially you take the 500 patterns you want to match and compile it into a single suffix tree which can very efficiently match an input against all the rules at once.

The paper explains that this approach was described as "Boyer-Moore Approach to Exact Set Matching" by Dan Gusfield.

Boyer-Moore is a well known algorithm for String matching. The paper describes a variation on Boyer-Moore for Set Matching.

like image 40
Ryan Avatar answered Oct 06 '22 05:10

Ryan