Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences among .NET Capture, Group, Match

Tags:

regex

I am making a small applicaiton using .NET Regex types. And the "Capture, Group and Match" types totally confused me. I have never seen such an ugly solution. Could someone explain their usage for me? Many thanks.

like image 945
smwikipedia Avatar asked Feb 12 '10 07:02

smwikipedia


1 Answers

Here's a simpler example than the one in the document @Dav cited:

string s0 = @"foo%123%456%789";
Regex r0 = new Regex(@"^([a-z]+)(?:%([0-9]+))+$");
Match m0 = r0.Match(s0);
if (m0.Success)
{
  Console.WriteLine(@"full match: {0}", m0.Value);
  Console.WriteLine(@"group #1: {0}", m0.Groups[1].Value);
  Console.WriteLine(@"group #2: {0}", m0.Groups[2].Value);
  Console.WriteLine(@"group #2 captures: {0}, {1}, {2}",
                    m0.Groups[2].Captures[0].Value,
                    m0.Groups[2].Captures[1].Value,
                    m0.Groups[2].Captures[2].Value);
}

result:

full match: foo%123%456%789
group #1: foo
group #2: 789
group #2 captures: 123, 456, 789

The full match and group #1 results are straightforward, but the others require some explanation. Group #2, as you can see, is inside a non-capturing group that's controlled by a + quantifier. It matches three times, but if you request its Value, you only get what it matched the third time around--the final capture. Similarly, if you use the $2 placeholder in a replacement string, the final capture is what gets inserted in its place.

In most regex flavors, that's all you can get; each intermediate capture is overwritten by the next and lost; .NET is almost unique in preserving all of the captures and making them available after the match is performed. You can access them directly as I did here, or iterate through the CaptureCollection as you would a MatchCollection. There's no equivalent for the $1-style replacement-string placeholders, though.

So the reason the API design is so ugly (as you put it) is twofold: first it was adapted from Perl's integral regex support to .NET's object-oriented framework; then the CaptureCollection structure was grafted onto it. Perl 6 offers a much cleaner solution, but the authors accomplished that by rewriting Perl practically from scratch and throwing backward compatibility out the window.

like image 127
Alan Moore Avatar answered Sep 21 '22 05:09

Alan Moore