I need to accept Regex from users--crazy, I know. The Google RE2 regular expression parser is safer than PCRE-based ones since it doesn't use backtracking, thus preventing catastrophic backtracking, infinite loops and general mayhem. It's purported to usually be faster. In my test case, which just parses a syslog line, it's over 300% slower. Any ideas why?
I'm using Node v7.7.3 on Ubuntu.
The code in question:
const SYSLOG_LINE_REGEX = new RegExp([
/(<[0-9]+>)?/, // 1 - optional priority
/([a-z]{3})\s+/, // 2 - month
/([0-9]{1,2})\s+/, // 3 - date
/([0-9]{2}):/, // 4 - hours
/([0-9]{2}):/, // 5 - minutes
/([0-9]{2})/, // 6 - seconds
/(\s+[\w.-]+)?\s+/, // 7 - host
/([\w\-().0-9/]+)/, // 8 - process
/(?:\[([a-z0-9-.]+)\])?:/, // 9 - optional pid
/(.+)/ // 10 message
].map(regex => regex.source).join(''), 'i');
const parts = SYSLOG_LINE_REGEX.exec(log.trim());
Update:
RE2 has linear worst-case complexity. Node.js's Irregexp engine has exponential worst-case complexity.
But! The worst-case behavior of an engine is a function not only of the regex but also of the input being tested. The regex /(a+)+$/
is worst-case exponential in Node.js but if you match it against anything but aaaaaaaaaa...a
it will run pretty quickly. The average-case match time of a regex is distinct from its worst-case complexity. The Node.js engine developers probably optimized the average-case complexity over the worst-case complexity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With