Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is node's backtracking regex faster than RE2 in this example

Tags:

regex

node.js

re2

I need to accept Regex from users--crazy, I know. The Google RE2 regular expression parser is safer than PCRE-based ones since it doesn't use backtracking, thus preventing catastrophic backtracking, infinite loops and general mayhem. It's purported to usually be faster. In my test case, which just parses a syslog line, it's over 300% slower. Any ideas why?

I'm using Node v7.7.3 on Ubuntu.

The code in question:

const SYSLOG_LINE_REGEX = new RegExp([
    /(<[0-9]+>)?/, // 1 - optional priority
    /([a-z]{3})\s+/, // 2 - month
    /([0-9]{1,2})\s+/, // 3 - date
    /([0-9]{2}):/, // 4 - hours
    /([0-9]{2}):/, // 5 - minutes
    /([0-9]{2})/, // 6 - seconds
    /(\s+[\w.-]+)?\s+/, // 7 - host
    /([\w\-().0-9/]+)/, // 8 - process
    /(?:\[([a-z0-9-.]+)\])?:/, // 9 - optional pid
    /(.+)/ // 10  message
].map(regex => regex.source).join(''), 'i');
const parts = SYSLOG_LINE_REGEX.exec(log.trim());

Update:

  • Using node module [email protected]
  • Using re2 C++ code dated 30 November 2016 that is included with the node-re2 package.
  • I have the libre2-dev package version 20160501+dfsg-1 installed. Perhaps I should either update the sources under node-re2 or have it simply link to the system libraries.
like image 437
Donald E. Foss Avatar asked Nov 08 '22 00:11

Donald E. Foss


1 Answers

RE2 has linear worst-case complexity. Node.js's Irregexp engine has exponential worst-case complexity.

But! The worst-case behavior of an engine is a function not only of the regex but also of the input being tested. The regex /(a+)+$/ is worst-case exponential in Node.js but if you match it against anything but aaaaaaaaaa...a it will run pretty quickly. The average-case match time of a regex is distinct from its worst-case complexity. The Node.js engine developers probably optimized the average-case complexity over the worst-case complexity.

like image 112
James Davis Avatar answered Nov 14 '22 22:11

James Davis