Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can anyone tell me why this C# email validation regular expression (regex) hangs?

Tags:

c#

regex

I got a good email vaidation regex from: Email regular expression

    public static void Main(string[] args)
    {
        string value = @"cvcvcvcvvcvvcvcvcvcvcvvcvcvcvcvcvvccvcvcvc";
        var regex = new Regex(
            @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$",
            RegexOptions.Compiled);
        var x = regex.Match(value); // Hangs here !?!
        return;
    }

It works in most cases, but the code above hangs, burning 100% CPU... I've tested in a W8 metro App. and on a standard .Net 4.5 app.

Can anyone tell me why this happens, and if there is a good email validation REGEX that doesn't hang, or if there is a way to fix this one?

Many thanks, Jon

like image 875
Jon Rea Avatar asked Oct 26 '12 13:10

Jon Rea


3 Answers

The explanation why it hangs: Catastrophic backtracking.

Let's simplify the crucial part of the regex:

(\w*[0-9a-zA-Z])*@

You have

  • an optional part \w* that can match the same characters as the following part [0-9a-zA-Z], so the two combined translate, in essence, to \w+
  • nested quantifiers: (\w+)*

This means that, given s = "cvcvcvcvvcvvcvcvcvcvcvvcvcvcvcvcvvccvcvcvc", this part of the regex needs to check all possible permutations of s (which number at 2**(len(s)-1)) before deciding on a non-match when the following @ is not found.

Since you cannot validate an e-mail address with any regex (there are far too many corner cases in the spec), it's usually best to

  • do a minimal regex check (^.*@.*$)
  • use a parser to check validity (like @Fake.It.Til.U.Make.It suggested)
  • try and send e-mail to it - even a seemingly valid address may be bogus, so you'd have to do this anyway.

Just for completeness, you can avoid the backtracking issues with the help of atomic groups:

var regex = new Regex(
    @"^([0-9a-zA-Z](?>[-.\w]*[0-9a-zA-Z])*@(?>[0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$",
    RegexOptions.Compiled);
like image 152
Tim Pietzcker Avatar answered Nov 15 '22 00:11

Tim Pietzcker


Never ever use regex to validate an email..

You can use MailAddress class to validate it

try 
{
    address = new MailAddress(address).Address;
   //address is valid
} 
catch(FormatException)
{
    //address is invalid
}
like image 37
Anirudha Avatar answered Nov 15 '22 00:11

Anirudha


guess it's because of [-.\w] in regex, try to use this:

^[a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)*@(?:(\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$

Also, in .net 4.5 EmailAttribute should be available, not sure though

like image 35
Sergio Avatar answered Nov 14 '22 23:11

Sergio