I am using mod security rules https://github.com/SpiderLabs/owasp-modsecurity-crs to sanitize user input data. I am facing cpu shoot up and delay in matching the user input with mod security rule regular expressions. Overall it contains 500+ regular expression to check different types of attacks(xss , badrobots , generic and sql). For each request , I go through all parameters and check against all these 500 regular expressions. I am using Matcher.find
to check the parameters. In this case some parameters fall in infinite looping , I tackled this using below technique.
Cancelling a long running regex match?.
Sanitize a user request took around ~500 ms and cpu % shoots up. I analyzed using visualvm.java.net with my test suite runner.
Cpu Profile Output
Please help me to reduce the cpu usage % and load average?
If possible, compile your regexes once and keep them around, rather than repeatedly (implicitly) compiling (especially inside a loop).
See java.util.regex - importance of Pattern.compile()? for more info.
I suggest you look at this paper: "Towards Faster String Matching for Intrusion Detection or Exceeding the Speed of Snort"
There are better ways to do the matching you describe. Essentially you take the 500 patterns you want to match and compile it into a single suffix tree which can very efficiently match an input against all the rules at once.
The paper explains that this approach was described as "Boyer-Moore Approach to Exact Set Matching" by Dan Gusfield.
Boyer-Moore is a well known algorithm for String matching. The paper describes a variation on Boyer-Moore for Set Matching.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With