Here's a regex for validating emails - \S+@\S+\.\S+
, I didn't write it. I'm new to Regular Expressions and do not understand them all that well.
I have a couple of questions:
"How do I validate an email with regex" is one of the more popular questions that come up when it comes to regular expressions and the only real good answer is "you don't". It has been discussed in this very website in many occasions. What you have to understand is that if you really wanted to follow the spec, your regex would look something like this. Obviously that is a monstrosity and is more an exercise in demonstrating how ridiculously difficult it is to adhere to what you are supposed to be able to accept. With that in mind, if you absolutely positively need to know that the email address is valid, the only real way to check for that is to actually send a message to the email address and check if it bounces or not. Otherwise, this regex will properly validate most cases, and in a lot of situations most cases is enough. In addition, that page will discuss the problems with trying to validate emails with regex.
I'm only going to answer your first question, and from a technical regex point of view.
What is wrong with the regex \S+@\S+\.\S+
, is that it has the potential to execute way too slowly. What happens if somebody enters an email string like the one below, and you need to validate it?
Or even worse (yes, that are 100 @'s after the dot):
@.@@@@@@@@@@@@@@@@@@@@@@@@@ \ @@@@@@@@@@@@@@@@@@@@@@@@@ \ @@@@@@@@@@@@@@@@@@@@@@@@@ \ @@@@@@@@@@@@@@@@@@@@@@@@@
Slowliness happens. First the regex would greedily match as many characters as possible for the first \S+
. So, it will initially match the whole string. Then we need the @ character, so it will backtrack until it finds one. At that point we've got another \S+
, so, again it will consume everything until the end of the string. Then it needs to backtrack again until it finds a dot. Can you imagine how much backtracking occurs before the regex finally fails on the second email string?
To kill the backtracking, I suggest using possessive character classes in this case, which have the additional benefit of not allowing multiple @'s in one string.
[^@\s]++@[^@\s.]++\.[^@\s]++
I did a quick benchmark for the two regexes against the “100 @'s email”. Mine is about 95 times faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With