Please don't answer the obvious, but what are the limit signs that tell us a problem should not be solved using regular expressions? For example: Why is a complete email validation too complex for a regular expression?

Regular expressions are a textual representation of finite-state automata. That is to say, they are limited to only non-recursive matching. This means that you can't have any concept of "scope" or "sub-match" in your regexp. Consider the following problem: <pre class="prettyprint"><code>(())() </code></pre> Are all the open parens matched with a close paren? Obviously, when we look at this as human beings, we can easily see that the answer is "yes". However, no regular expression will be able to reliably answer this question. In order to do this sort of processing, you will need a full pushdown automaton (like a DFA with a stack). This is most commonly found in the guise of a parser such as those generated by ANTLR or Bison.

A few things to look out for: <ol> <li>beginning and ending tag detection -- matched pairing</li> <li>recursion</li> <li>needing to go backwards (though you can reverse the string, but that's a hack)</li> </ol> regexes, as much as I love them, aren't good at those three things. And remember, keep it simple! If you're trying to build a regex that does "everything", then you're probably doing it wrong.

When is a issue too complex for a regular expression?

3 Answers

Regular expressions are a textual representation of finite-state automata. That is to say, they are limited to only non-recursive matching. This means that you can't have any concept of "scope" or "sub-match" in your regexp. Consider the following problem:

(())()

Are all the open parens matched with a close paren?

Obviously, when we look at this as human beings, we can easily see that the answer is "yes". However, no regular expression will be able to reliably answer this question. In order to do this sort of processing, you will need a full pushdown automaton (like a DFA with a stack). This is most commonly found in the guise of a parser such as those generated by ANTLR or Bison.

191

answered Sep 25 '22 16:09

Daniel Spiewak

A few things to look out for:

beginning and ending tag detection -- matched pairing
recursion
needing to go backwards (though you can reverse the string, but that's a hack)

regexes, as much as I love them, aren't good at those three things. And remember, keep it simple! If you're trying to build a regex that does "everything", then you're probably doing it wrong.

answered Sep 24 '22 16:09

Jeff Atwood

When you need to parse an expression that's not defined by a regular language.

answered Sep 22 '22 16:09

Adam Rosenfield

Related questions
                            
                                Ellipse bounding a rectangle [closed]
                            
                                Is there any advantage in using a Python class?
                            
                                How to output to the console in C++/Windows
                            
                                How to copy indexes from one table to another in SQL Server
                            
                                How do I make a command line text editor?
                            
                                How do you make your Java application memory efficient?
                            
                                Move window from second screen to the main when the second screen is not visible [closed]
                            
                                How to change Named Range Scope
                            
                                DataTable.DefaultView.Sort Doesn't Sort
                            
                                Executing a SQL script stored as a resource
                            
                                Clojure infinite loop
                            
                                Can you perform a case-insensitive string comparison in MSBuild?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When is a issue too complex for a regular expression?

Tags:

Null303

People also ask

3 Answers

Daniel Spiewak

Jeff Atwood

Adam Rosenfield

Recent Activity

Donate For Us