Are there any tools that will take a particular regular expression and return the worst case scenario in terms of the number of operations required for a certain number of characters that the regular expression is matched against?
So for example, given a (f|a)oo.*[ ]baz
, how many steps might the engine possibly go though through to match 100 characters?
I would also be interested if there is a tool that can take a bunch of text samples and show the average operations for each run.
I realize this will depend a lot on the engine used and the implementation -- but I am ignorant as to how common this is. So if it is common for many languages (making my question too vague) I would be particularly interested in Perl and Python.
In that case, a regular expression is (a+b)bbb(a+b). The anatomy of this regular expression is the following: (a+b) gives either "a" or "b" (a+b)* gives any string of "a"s and "b"s whatever.
Time Complexity of Regex is O(MxN). Hence, FlashText algorithm is much faster than Regex.
Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.
Regexbuddy's debugger shows how many steps engine would take to conclude match or not on a given sample. More information on catastrophic backtracking and debugging regular expressions.
PS: It is not free but they offer a 3-month money-back guarantee.
Note that it depends on the engine. While regex theory is based on straight automata theory, most of the engines are not strict translations of those theories. For this reason, for instance, some engines incur in exponential time while strict NFA processing would not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With