Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex with non-capturing group in C#

I am using the following Regex

JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*

on the following type of data:

 JOINTS               DISPL.-X               DISPL.-Y               ROTATION


     1            0.000000E+00           0.975415E+01           0.616921E+01
     2            0.000000E+00           0.000000E+00           0.000000E+00

The idea is to extract two groups, each containing a line (starting with the Joint Number, 1, 2, etc.) The C# code is as follows:

string jointPattern = @"JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*";
MatchCollection mc = Regex.Matches(outFileSection, jointPattern );
foreach (Capture c in mc[0].Captures)
{
    JointOutput j = new JointOutput();
    string[] vals = c.Value.Split();
    j.Joint = int.Parse(vals[0]) - 1;
    j.XDisplacement = float.Parse(vals[1]);
    j.YDisplacement = float.Parse(vals[2]);
    j.Rotation = float.Parse(vals[3]);
    joints.Add(j);
}

However, this does not work: rather than returning two captured groups (the inside group), it returns one group: the entire block, including the column headers. Why does this happen? Does C# deal with un-captured groups differently?

Finally, are RegExes the best way to do this? (I really do feel like I have two problems now.)

like image 575
ian93 Avatar asked Mar 02 '13 03:03

ian93


People also ask

How do you write a non capturing group in regex?

Sometimes you want to use parentheses to group parts of an expression together, but you don't want the group to capture anything from the substring it matches. To do this use (?: and ) to enclose the group.

How do I match a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

Which operator is required to group in regex?

Parentheses are used for grouping in regular expressions as in arithmetic. They can be used to concatenate regular expressions containing the alternation operator, ' | '. For example, ' @(samp|code)\{[^}]+\} ' matches both ' @code{foo} ' and ' @samp{bar} '.


1 Answers

mc[0].Captures is equivalent to mc[0].Groups[0].Captures. Groups[0] always refers to the whole match, so there will only ever be the one Capture associated with it. The part you're looking for is captured in group #1, so you should be using mc[0].Groups[1].Captures.

But your regex is designed to match the whole input in one attempt, so the Matches() method will always return a MatchCollection with only one Match in it (assuming the match is successful). You might as well use Match() instead:

  Match m = Regex.Match(source, jointPattern);
  if (m.Success)
  {
    foreach (Capture c in m.Groups[1].Captures)
    {
      Console.WriteLine(c.Value);
    }
  }

output:

1            0.000000E+00           0.975415E+01           0.616921E+01
2            0.000000E+00           0.000000E+00           0.000000E+00
like image 185
Alan Moore Avatar answered Sep 30 '22 23:09

Alan Moore