Why can regular expressions have an exponential running time?

Tags:

regex

People also ask

Why are regular expressions fast?

Good regular expressions are often longer than bad regular expressions because they make use of specific characters/character classes and have more structure. This causes good regular expressions to run faster as they predict their input more accurately.

Why are regular expressions slow?

The reason the regex is so slow is that the "*" quantifier is greedy by default, and so the first ". *" tries to match the whole string, and after that begins to backtrack character by character. The runtime is exponential in the count of numbers on a line.

Is regular expression faster than loop?

Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement.

Russ Cox has a very detailed article about why this is and the history of regexes (part 2, part 3).

Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow. With the exception of backreferences, the features provided by the slow backtracking implementations can be provided by the automata-based implementations at dramatically faster, more consistent speeds.

Largely, it comes down to proliferation of non-regular features in "regular" expressions such as backreferences, and the (continued) ignorance of most programmers that there are better alternatives for regexes that do not contain such features (which is many of them).

While writing the text editor sam in the early 1980s, Rob Pike wrote a new regular expression implementation, which Dave Presotto extracted into a library that appeared in the Eighth Edition. Pike's implementation incorporated submatch tracking into an efficient NFA simulation but, like the rest of the Eighth Edition source, was not widely distributed. Pike himself did not realize that his technique was anything new. Henry Spencer reimplemented the Eighth Edition library interface from scratch, but using backtracking, and released his implementation into the public domain. It became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on. (In his defense, Spencer knew the routines could be slow, and he didn't know that a more efficient algorithm existed. He even warned in the documentation, “Many users have found the speed perfectly adequate, although replacing the insides of egrep with this code would be a mistake.”) Pike's regular expression implementation, extended to support Unicode, was made freely available with sam in late 1992, but the particularly efficient regular expression search algorithm went unnoticed.

Regular expressions conforming to this formal definition are computable in linear time, because they have corresponding finite automatas. They are built only from parentheses, alternative | (sometimes called sum), Kleene star * and concatenation.

Extending regular expressions by adding, for example, backward references can lead even to NP-complete regular expressions. Here you can find an example of regular expression recognizing non-prime numbers.

I guess that, such an extended implementation can have non-linear matching time even in simple cases.

I made a quick experiment in Perl and your regular expression computes equally fast for odd and even number of 'a's.

Related questions
                            
                                jQuery DataTables - Filter column by exact match
                            
                                How to prefix a positive number with plus sign in PHP
                            
                                How to get Open Graph Protocol of a webpage by php?
                            
                                Regex replace text outside html tags
                            
                                How to match numbers between X and Y with regexp?
                            
                                How to add a string to the end of each line in VS Code using regex?
                            
                                How to determine if a File Matches a File Mask?
                            
                                regex find characters between " "
                            
                                Extract float/double value
                            
                                Regex to accept alphanumeric and some special character in Javascript? [closed]
                            
                                How can I match the start and end in Python's regex?
                            
                                postgres regexp_replace want to allow only a-z and A-Z
                            
                                Regex Expressions for all non alphanumeric symbols
                            
                                Can I substitute multiple items in a single regular expression in VIM or Perl?
                            
                                regex extract email from strings
                            
                                Search and replace in Visual Studio
                            
                                Best way to get all digits from a string [duplicate]
                            
                                Validation for URL/Domain using Regex? (Rails)
                            
                                What does (?!a){0}? mean in a Java regex?
                            
                                Difference between regex_match and regex_search?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can regular expressions have an exponential running time?

Tags:

regex

People also ask

Related questions

Recent Activity

Donate For Us