Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Worst Case Analysis for Regular Expressions

Are there any tools that will take a particular regular expression and return the worst case scenario in terms of the number of operations required for a certain number of characters that the regular expression is matched against?

So for example, given a (f|a)oo.*[ ]baz, how many steps might the engine possibly go though through to match 100 characters?

I would also be interested if there is a tool that can take a bunch of text samples and show the average operations for each run.

I realize this will depend a lot on the engine used and the implementation -- but I am ignorant as to how common this is. So if it is common for many languages (making my question too vague) I would be particularly interested in Perl and Python.

like image 578
Kyle Brandt Avatar asked Jan 19 '11 02:01

Kyle Brandt


People also ask

What is BBB regular expression?

In that case, a regular expression is (a+b)bbb(a+b). The anatomy of this regular expression is the following: (a+b) gives either "a" or "b" (a+b)* gives any string of "a"s and "b"s whatever.

What is the time complexity of a regex?

Time Complexity of Regex is O(MxN). Hence, FlashText algorithm is much faster than Regex.

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.


2 Answers

Regexbuddy's debugger shows how many steps engine would take to conclude match or not on a given sample. More information on catastrophic backtracking and debugging regular expressions.

catastrophic backtracking shown in RegexBuddy

PS: It is not free but they offer a 3-month money-back guarantee.

like image 110
Himanshu Avatar answered Sep 28 '22 04:09

Himanshu


Note that it depends on the engine. While regex theory is based on straight automata theory, most of the engines are not strict translations of those theories. For this reason, for instance, some engines incur in exponential time while strict NFA processing would not.

like image 30
Daniel C. Sobral Avatar answered Sep 28 '22 04:09

Daniel C. Sobral