So I have a string that I need to split by semicolon's
Email address: "one@tw;,.'o"@hotmail.com;"some;thing"@example.com
Both of the email addresses are valid
So I want to have a List<string>
of the following:
But the way I am currently splitting the addresses is not working:
var addresses = emailAddressString.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim()).ToList();
Because of the multiple ;
characters I end up with invalid email addresses.
I have tried a few different ways, even going down working out if the string contains quotes and then finding the index of the ;
characters and working it out that way, but it's a real pain.
Does anyone have any better suggestions?
When entering email addresses, they are separated by a semicolon and not a comma by default. The semicolon is used as a separator for multiple email addresses when typing into the To and Cc section – but can be changed in the Outlook settings to a comma to separate multiple message recipients when sending messages.
Every email address has two main parts: a username and domain name. The username comes first, followed by an at (@) symbol, followed by the domain name. In the example below, "mail" is the username and "techterms.com" is the domain name.
Assuming that double-quotes are not allowed, except for the opening and closing quotes ahead of the "at" sign @
, you can use this regular expression to capture e-mail addresses:
((?:[^@"]+|"[^"]*")@[^;]+)(?:;|$)
The idea is to capture either an unquoted [^@"]+
or a quoted "[^"]*"
part prior to @
, and then capture everything up to semicolon ;
or the end anchor $
.
Demo of the regex.
var input = "\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world";
var mm = Regex.Matches(input, "((?:[^@\"]+|\"[^\"]*\")@[^;]+)(?:;|$)");
foreach (Match m in mm) {
Console.WriteLine(m.Groups[1].Value);
}
This code prints
"one@tw;,.'o"@hotmail.com
"some;thing"@example.com
hello@world
Demo 1.
If you would like to allow escaped double-quotes inside double-quotes, you could use a more complex expression:
((?:(?:[^@\"]|(?<=\\)\")+|\"([^\"]|(?<=\\)\")*\")@[^;]+)(?:;|$)
Everything else remains the same.
Demo 2.
I obviously started writing my anti regex method at around the same time as juharr (Another answer). I thought that since I already have it written I would submit it.
public static IEnumerable<string> SplitEmailsByDelimiter(string input, char delimiter)
{
var startIndex = 0;
var delimiterIndex = 0;
while (delimiterIndex >= 0)
{
delimiterIndex = input.IndexOf(';', startIndex);
string substring = input;
if (delimiterIndex > 0)
{
substring = input.Substring(0, delimiterIndex);
}
if (!substring.Contains("\"") || substring.IndexOf("\"") != substring.LastIndexOf("\""))
{
yield return substring;
input = input.Substring(delimiterIndex + 1);
startIndex = 0;
}
else
{
startIndex = delimiterIndex + 1;
}
}
}
Then the following
var input = "[email protected];\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world;[email protected];";
foreach (var email in SplitEmailsByDelimiter(input, ';'))
{
Console.WriteLine(email);
}
Would give this output
[email protected]
"one@tw;,.'o"@hotmail.com
"some;thing"@example.com
hello@world
[email protected]
You can also do this without using regular expressions. The following extension method will allow you to specify a delimiter character and a character to begin and end escape sequences. Note it does not validate that all escape sequences are closed.
public static IEnumerable<string> SpecialSplit(
this string str, char delimiter, char beginEndEscape)
{
int beginIndex = 0;
int length = 0;
bool escaped = false;
foreach (char c in str)
{
if (c == beginEndEscape)
{
escaped = !escaped;
}
if (!escaped && c == delimiter)
{
yield return str.Substring(beginIndex, length);
beginIndex += length + 1;
length = 0;
continue;
}
length++;
}
yield return str.Substring(beginIndex, length);
}
Then the following
var input = "\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world;\"D;D@blah;blah.com\"";
foreach (var address in input.SpecialSplit(';', '"'))
Console.WriteLine(v);
While give this output
"one@tw;,.'o"@hotmail.com
"some;thing"@example.com
hello@world
"D;D@blah;blah.com"
Here's the version that works with an additional single escape character. It assumes that two consecutive escape characters should become one single escape character and it's escaping both the beginEndEscape
charter so it will not trigger the beginning or end of an escape sequence and it also escapes the delimiter
. Anything else that comes after the escape character will be left as is with the escape character removed.
public static IEnumerable<string> SpecialSplit(
this string str, char delimiter, char beginEndEscape, char singleEscape)
{
StringBuilder builder = new StringBuilder();
bool escapedSequence = false;
bool previousEscapeChar = false;
foreach (char c in str)
{
if (c == singleEscape && !previousEscapeChar)
{
previousEscapeChar = true;
continue;
}
if (c == beginEndEscape && !previousEscapeChar)
{
escapedSequence = !escapedSequence;
}
if (!escapedSequence && !previousEscapeChar && c == delimiter)
{
yield return builder.ToString();
builder.Clear();
continue;
}
builder.Append(c);
previousEscapeChar = false;
}
yield return builder.ToString();
}
Finally you probably should add null
checking for the string that is passed in and note that both will return a sequence with one empty string if you pass in an empty string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With