Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pathological regex that blows up (time & memory)?

What's a pathological regex that blows up many parsers (both in time & memory)? and which parsers? Bonus points the more basic and standard the regex is, and the more likely that a non-malicious user might innocently come up with it. Feel free to post actual time and memory data, and parser version.

(I seem to remember that excessive lookbehind assertions or (EDIT:)backtracking in PERL are said to do this, or at least used to be. Anything else?)

like image 670
smci Avatar asked Mar 15 '11 03:03

smci


People also ask

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.

Why is regex so complicated?

Regular expressions are dense. This makes them hard to read, but not in proportion to the information they carry. Certainly 100 characters of regular expression syntax is harder to read than 100 consecutive characters of ordinary prose or 100 characters of C code.

What is catastrophic backtracking regex?

Catastrophic backtracking is a condition that can occur if you're checking a (usually long) string against a complex regular expression. The problem usually occurs if something towards the end of the string causes the string to not match.


2 Answers

Adapted from the first example in the article Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...):

perl -e '$n=29; ("a" x $n) =~ (("a?" x $n).("a" x $n))'

Which takes 40+ seconds on my system. Then do $n++ for exponentially increasing fun...

like image 144
Eric Strom Avatar answered Oct 17 '22 19:10

Eric Strom


From Russ Cox's excellent article: $ perl -e '("a" x 100000) =~ /^(ab?)*$/;'. This apparently causes a segfault. There are more in the article.

like image 27
MAK Avatar answered Oct 17 '22 17:10

MAK