Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Remove Duplicate Matches in a MatchCollection

In my MatchCollection, I get matches of the same thing. Like this:

string text = @"match match match";
Regex R = new Regex("match");
MatchCollection M = R.Matches(text);

How does one remove duplicate matches and is it the fastest way possible?

Assume "duplicate" here means that the match contains the exact same string.

like image 794
rayanisran Avatar asked Dec 21 '11 16:12

rayanisran


2 Answers

Linq

If you are using .Net 3.5 or greater such as 4.7, linq can be used to remove the duplicates of the match.

string data = "abc match match abc";

Console.WriteLine(string.Join(", ", 

Regex.Matches(data, @"([^\s]+)")
     .OfType<Match>()
     .Select (m => m.Groups[0].Value)
     .Distinct()

));

// Outputs abc, match

.Net 2 or No Linq

Place it into a hastable then extract the strings:

string data = "abc match match abc";

MatchCollection mc = Regex.Matches(data, @"[^\s]+");

Hashtable hash = new Hashtable();

foreach (Match mt in mc)
{
    string foundMatch = mt.ToString();
    if (hash.Contains(foundMatch) == false)
        hash.Add(foundMatch, string.Empty);

}

// Outputs abc and match.
foreach (DictionaryEntry element in hash)
    Console.WriteLine (element.Key);
like image 119
ΩmegaMan Avatar answered Nov 15 '22 17:11

ΩmegaMan


Try

Regex rx = new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b", RegexOptions.Compiled);
string text = @"match match match";
MatchCollection matches = rx.Matches(text);
like image 39
Amit Rai Sharma Avatar answered Nov 15 '22 17:11

Amit Rai Sharma