Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are people using regexp for email and other complex validation?

There are a number of email regexp questions popping up here, and I'm honestly baffled why people are using these insanely obtuse matching expressions rather than a very simple parser that splits the email up into the name and domain tokens, and then validates those against the valid characters allowed for name (there's no further check that can be done on this portion) and the valid characters for the domain (and I suppose you could add checking for all the world's TLDs, and then another level of second level domains for countries with such (ie, com.uk)).

The real problem is that the tlds and slds keep changing (contrary to popular belief), so you have to keep updating the regexp if you plan on doing all this high level checking whenever the root name servers send down a change.

Why not have a module that simply validates domains, which pulls from a database, or flat file, and optionally checks DNS for matching records?

I'm being serious here, why is everyone so keen on inventing the perfect regexp for this? It doesn't seem to be a suitable solution to the problem...

Convince me that it's not only possible to do in regexp (and satisfy everyone) but that it's a better solution than a custom parser/validator.

-Adam

like image 558
Adam Davis Avatar asked Oct 17 '08 11:10

Adam Davis


1 Answers

They do it because they see "I want to test whether this text matches the spec" and immediately think "I know, I'll use a regex!" without fully understanding the complexity of the spec or the limitations of regexes. Regexes are a wonderful, powerful tool for handling a wide variety of text-matching tasks, but they are not the perfect tool for every such task and it seems that many people who use them lose sight of that fact.

like image 152
Dave Sherohman Avatar answered Oct 12 '22 13:10

Dave Sherohman