Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the preferred way to filter a regex search for duplicate matches in C#

A new question has arisen in relation to an earlier question of mine. I have some code that is using a regex to find email addresses. It's working great except that it returns duplicate matches. I searched this site and found a question from a long time ago that was dealing with a similar problem, and the answer had something to do with mixing the regex logic with a string[] and the Distinct() method. Unfortunately my understanding of arrays is still limited.

My code is placing all the regex matches into a MatchCollection. Aside from that, how do I go about interacting with this MatchCollection to only add unique matches to the regex?

like image 381
Stev0 Avatar asked Aug 08 '10 21:08

Stev0


1 Answers

You can do it using Distinct in .NET 3.5 or newer if you call Cast on your match collection so that you can use the LINQ extension methods:

MatchCollection matchCollection = Regex.Matches(input, pattern);
List<string> matches = matchCollection
    .Cast<Match>()
    .Select(m => m.Value)
    .Distinct()
    .ToList();

This assumes that you have the following usings at the top of your file:

using System.Linq;
using System.Collections.Generic;
like image 184
Mark Byers Avatar answered Oct 05 '22 11:10

Mark Byers