Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression Groups in C#

Tags:

c#

regex

I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.

var pattern = @"\[(.*?)\]"; var matches = Regex.Matches(user, pattern); if (matches.Count > 0 && matches[0].Groups.Count > 1)     ... 

For the input user == "Josh Smith [jsmith]":

matches.Count == 1 matches[0].Value == "[jsmith]" 

... which I understand. But then:

matches[0].Groups.Count == 2 matches[0].Groups[0].Value == "[jsmith]" matches[0].Groups[1].Value == "jsmith" <=== how? 

Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?

Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?

like image 357
Lester Avatar asked Jun 16 '11 17:06

Lester


People also ask

What are groups in regular expressions?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

How do I reference a capture group in regex?

If your regular expression has named capturing groups, then you should use named backreferences to them in the replacement text. The regex (?' name'group) has one group called “name”. You can reference this group with ${name} in the JGsoft applications, Delphi, .

Which operator is required to group in regex?

The Concatenation Operator This operator concatenates two regular expressions a and b . No character represents this operator; you simply put b after a . The result is a regular expression that will match a string if a matches its first part and b matches the rest.


1 Answers

  • match.Groups[0] is always the same as match.Value, which is the entire match.
  • match.Groups[1] is the first capturing group in your regular expression.

Consider this example:

var pattern = @"\[(.*?)\](.*)"; var match = Regex.Match("ignored [john] John Johnson", pattern); 

In this case,

  • match.Value is "[john] John Johnson"
  • match.Groups[0] is always the same as match.Value, "[john] John Johnson".
  • match.Groups[1] is the group of captures from the (.*?).
  • match.Groups[2] is the group of captures from the (.*).
  • match.Groups[1].Captures is yet another dimension.

Consider another example:

var pattern = @"(\[.*?\])+"; var match = Regex.Match("[john][johnny]", pattern); 

Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures!

  • match.Groups[0] is always the same as match.Value, "[john][johnny]".
  • match.Groups[1] is the group of captures from the (\[.*?\])+. The same as match.Value in this case.
  • match.Groups[1].Captures[0] is the same as match.Groups[1].Value
  • match.Groups[1].Captures[1] is [john]
  • match.Groups[1].Captures[2] is [johnny]
like image 173
agent-j Avatar answered Oct 02 '22 19:10

agent-j