Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract consecutive email addresses from text in C#

Tags:

c#

regex

email

I have the following three examples of strings:

string1 = "[email protected] this is just some text. these are just some numbers 123456 [email protected] asdasd asdad"

string2 = "[email protected] [email protected] This is just some text. these are just some numbers 123456 [email protected] asdasd asd"

string3 = "[email protected] [email protected] [email protected] This is just some text. these are just some numbers 123456 [email protected] asdad"

Final output should be a List consisting of all the emails that appear consecutively at the beginning of the string.

Output for string1 - one email address

Output for string3 - three email addresses

Address "[email protected]" should be ignored as it appears between some other text. Is there any solution for this? The existing method returns all the addresses.

    private List<string> ExtractEmails(string strStringGoesHere)
    {
        List<string> lstExtractedEmails = new List<string>();
        Regex reg = new Regex(@"[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}", RegexOptions.IgnoreCase);
        Match match;
        for (match = reg.Match(strStringGoesHere); match.Success; match = match.NextMatch())
        {
            if (!(lstExtractedEmails.Contains(match.Value)))
            {
                lstExtractedEmails.Add(match.Value);
            }
        }
        return lstExtractedEmails;
    }
like image 295
DevSa Avatar asked Jan 28 '26 06:01

DevSa


1 Answers

You may use \G anchor that only matches at the start of the string and then at the end of each successful match:

@"(?i)\G\s*([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6})"

See this demo

Details

  • (?i) - inline case insensitive flag
  • \G - anchor that only matches at the start of the string and at the end of each successful match
  • \s* - 0+ whitespaces
  • ([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}) - Group 1 matching an email like substring (there are other patterns that you may use here, but generally, it is something like \S+@\S+\.\S+).

C# demo:

var strs = new List<string> {"[email protected] this is just some text. these are just some numbers 123456 [email protected] asdasd asdad",
    "[email protected] [email protected] This is just some text. these are just some numbers 123456 [email protected] asdasd asd",
    "[email protected] [email protected] [email protected] This is just some text. these are just some numbers 123456 [email protected] asdad" };
foreach (var s in strs) 
{
    var results = Regex.Matches(s, @"(?i)\G\s*([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6})")
        .Cast<Match>()
        .Select(x => x.Groups[1].Value);
    Console.WriteLine(string.Join(", ", results));
}

Results:

[email protected]
[email protected], [email protected]
[email protected], [email protected], [email protected]
like image 82
Wiktor Stribiżew Avatar answered Jan 29 '26 18:01

Wiktor Stribiżew