Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Email address splitting

Tags:

c#

So I have a string that I need to split by semicolon's

Email address: "one@tw;,.'o"@hotmail.com;"some;thing"@example.com

Both of the email addresses are valid

So I want to have a List<string> of the following:

  • "one@tw;,.'o"@hotmail.com
  • "some;thing"@example.com

But the way I am currently splitting the addresses is not working:

var addresses = emailAddressString.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries)
                .Select(x => x.Trim()).ToList();

Because of the multiple ; characters I end up with invalid email addresses.

I have tried a few different ways, even going down working out if the string contains quotes and then finding the index of the ; characters and working it out that way, but it's a real pain.

Does anyone have any better suggestions?

like image 354
Jamie Rees Avatar asked Nov 11 '15 13:11

Jamie Rees


People also ask

Do you separate email addresses with commas?

When entering email addresses, they are separated by a semicolon and not a comma by default. The semicolon is used as a separator for multiple email addresses when typing into the To and Cc section – but can be changed in the Outlook settings to a comma to separate multiple message recipients when sending messages.

What is the breakdown of an email address?

Every email address has two main parts: a username and domain name. The username comes first, followed by an at (@) symbol, followed by the domain name. In the example below, "mail" is the username and "techterms.com" is the domain name.


3 Answers

Assuming that double-quotes are not allowed, except for the opening and closing quotes ahead of the "at" sign @, you can use this regular expression to capture e-mail addresses:

((?:[^@"]+|"[^"]*")@[^;]+)(?:;|$)

The idea is to capture either an unquoted [^@"]+ or a quoted "[^"]*" part prior to @, and then capture everything up to semicolon ; or the end anchor $.

Demo of the regex.

var input = "\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world";
var mm = Regex.Matches(input, "((?:[^@\"]+|\"[^\"]*\")@[^;]+)(?:;|$)");
foreach (Match m in mm) {
    Console.WriteLine(m.Groups[1].Value);
}

This code prints

"one@tw;,.'o"@hotmail.com
"some;thing"@example.com
hello@world

Demo 1.

If you would like to allow escaped double-quotes inside double-quotes, you could use a more complex expression:

((?:(?:[^@\"]|(?<=\\)\")+|\"([^\"]|(?<=\\)\")*\")@[^;]+)(?:;|$)

Everything else remains the same.

Demo 2.

like image 122
Sergey Kalinichenko Avatar answered Oct 19 '22 07:10

Sergey Kalinichenko


I obviously started writing my anti regex method at around the same time as juharr (Another answer). I thought that since I already have it written I would submit it.

    public static IEnumerable<string> SplitEmailsByDelimiter(string input, char delimiter)
    {
        var startIndex = 0;
        var delimiterIndex = 0;

        while (delimiterIndex >= 0)
        {
            delimiterIndex = input.IndexOf(';', startIndex);
            string substring = input;
            if (delimiterIndex > 0)
            {
                substring = input.Substring(0, delimiterIndex);
            }

            if (!substring.Contains("\"") || substring.IndexOf("\"") != substring.LastIndexOf("\""))
            {
                yield return substring;
                input = input.Substring(delimiterIndex + 1);
                startIndex = 0;
            }
            else
            {
                startIndex = delimiterIndex + 1;
            }
        }
    }

Then the following

            var input = "[email protected];\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world;[email protected];";
            foreach (var email in SplitEmailsByDelimiter(input, ';'))
            {
                Console.WriteLine(email);
            }

Would give this output

[email protected]
"one@tw;,.'o"@hotmail.com
"some;thing"@example.com
hello@world
[email protected]
like image 31
Darren Gourley Avatar answered Oct 19 '22 07:10

Darren Gourley


You can also do this without using regular expressions. The following extension method will allow you to specify a delimiter character and a character to begin and end escape sequences. Note it does not validate that all escape sequences are closed.

public static IEnumerable<string> SpecialSplit(
    this string str, char delimiter, char beginEndEscape)
{
    int beginIndex = 0;
    int length = 0;
    bool escaped = false;
    foreach (char c in str)
    {
        if (c == beginEndEscape)
        {
            escaped = !escaped;
        }
            
        if (!escaped && c == delimiter)
        {
            yield return str.Substring(beginIndex, length);
            beginIndex += length + 1;
            length = 0;
            continue;
        }

        length++;
    }

    yield return str.Substring(beginIndex, length);
}

Then the following

var input = "\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world;\"D;D@blah;blah.com\"";
foreach (var address in input.SpecialSplit(';', '"')) 
    Console.WriteLine(v);

While give this output

"one@tw;,.'o"@hotmail.com

"some;thing"@example.com

hello@world

"D;D@blah;blah.com"

Here's the version that works with an additional single escape character. It assumes that two consecutive escape characters should become one single escape character and it's escaping both the beginEndEscape charter so it will not trigger the beginning or end of an escape sequence and it also escapes the delimiter. Anything else that comes after the escape character will be left as is with the escape character removed.

public static IEnumerable<string> SpecialSplit(
    this string str, char delimiter, char beginEndEscape, char singleEscape)
{
    StringBuilder builder = new StringBuilder();
    bool escapedSequence = false;
    bool previousEscapeChar = false;
    foreach (char c in str)
    {
        if (c == singleEscape && !previousEscapeChar)
        {
            previousEscapeChar = true;
            continue;
        }

        if (c == beginEndEscape && !previousEscapeChar)
        {
            escapedSequence = !escapedSequence;
        }

        if (!escapedSequence && !previousEscapeChar && c == delimiter)
        {
            yield return builder.ToString();
            builder.Clear();
            continue;
        }

        builder.Append(c);
        previousEscapeChar = false;
    }

    yield return builder.ToString();
}

Finally you probably should add null checking for the string that is passed in and note that both will return a sequence with one empty string if you pass in an empty string.

like image 37
juharr Avatar answered Oct 19 '22 07:10

juharr