Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently Combine MatchCollections in .NET regular expressions

In the simplified example, there are two regular expressions, one case sensitive, the other not. The idea would be to efficiently create an IEnumerable collection (see "combined" below) combining the results.

string test = "abcABC";
string regex = "(?<grpa>a)|(?<grpb>b)|(?<grpc>c)]";
Regex regNoCase = new Regex(regex, RegexOptions.IgnoreCase);
Regex regCase = new Regex(regex);

MatchCollection matchNoCase = regNoCase.Matches(test);
MatchCollection matchCase = regCase.Matches(test);

// Combine matchNoCase and matchCase into an IEnumerable
IEnumerable<Match> combined = null;
foreach (Match match in combined)
{
    // Use the Index and (successful) Groups properties
    //of the match in another operation

}

In practice, the MatchCollections might contain thousands of results and be run frequently using long dynamically created regular expressions, so I'd like to shy away from copying the results to arrays, etc. I am still learning LINQ and am fuzzy on how to go about combining these or what the performance hits to an already sluggish process will be.

like image 977
Laramie Avatar asked May 26 '10 23:05

Laramie


1 Answers

There are three steps here:

  1. Convert the MatchCollection's to IEnumerable<Match>'s
  2. Concatenate the sequences
  3. Filter by whether the Match.Success property is true

Code:

IEnumerable<Match> combined = matchNoCase.OfType<Match>().Concat(matchCase.OfType<Match>()).Where(m => m.Success);

Doing this creates a new enumerator which only executes each step as the next result is fetched, so you only end up enumerating through each collection once, total. For example, Concat() will only start executing the second enumerator after the first runs out.

like image 84
Rex M Avatar answered Oct 05 '22 22:10

Rex M