Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is a issue too complex for a regular expression?

Tags:

Please don't answer the obvious, but what are the limit signs that tell us a problem should not be solved using regular expressions?

For example: Why is a complete email validation too complex for a regular expression?

like image 482
Null303 Avatar asked Oct 23 '08 16:10

Null303


People also ask

Why are regular expressions so complicated?

Density. Regular expressions are dense. This makes them hard to read, but not in proportion to the information they carry. Certainly 100 characters of regular expression syntax is harder to read than 100 consecutive characters of ordinary prose or 100 characters of C code.

Is regular expression difficult?

In brief, regexes are hard. Not only are they hard to read, our participants said that they are hard to search for, hard to validate, and hard to document.


3 Answers

Regular expressions are a textual representation of finite-state automata. That is to say, they are limited to only non-recursive matching. This means that you can't have any concept of "scope" or "sub-match" in your regexp. Consider the following problem:

(())()

Are all the open parens matched with a close paren?

Obviously, when we look at this as human beings, we can easily see that the answer is "yes". However, no regular expression will be able to reliably answer this question. In order to do this sort of processing, you will need a full pushdown automaton (like a DFA with a stack). This is most commonly found in the guise of a parser such as those generated by ANTLR or Bison.

like image 191
Daniel Spiewak Avatar answered Sep 25 '22 16:09

Daniel Spiewak


A few things to look out for:

  1. beginning and ending tag detection -- matched pairing
  2. recursion
  3. needing to go backwards (though you can reverse the string, but that's a hack)

regexes, as much as I love them, aren't good at those three things. And remember, keep it simple! If you're trying to build a regex that does "everything", then you're probably doing it wrong.

like image 13
Jeff Atwood Avatar answered Sep 24 '22 16:09

Jeff Atwood


When you need to parse an expression that's not defined by a regular language.

like image 9
Adam Rosenfield Avatar answered Sep 22 '22 16:09

Adam Rosenfield