I am using the following Regex
JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*
on the following type of data:
JOINTS DISPL.-X DISPL.-Y ROTATION
1 0.000000E+00 0.975415E+01 0.616921E+01
2 0.000000E+00 0.000000E+00 0.000000E+00
The idea is to extract two groups, each containing a line (starting with the Joint Number, 1, 2, etc.) The C# code is as follows:
string jointPattern = @"JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*";
MatchCollection mc = Regex.Matches(outFileSection, jointPattern );
foreach (Capture c in mc[0].Captures)
{
JointOutput j = new JointOutput();
string[] vals = c.Value.Split();
j.Joint = int.Parse(vals[0]) - 1;
j.XDisplacement = float.Parse(vals[1]);
j.YDisplacement = float.Parse(vals[2]);
j.Rotation = float.Parse(vals[3]);
joints.Add(j);
}
However, this does not work: rather than returning two captured groups (the inside group), it returns one group: the entire block, including the column headers. Why does this happen? Does C# deal with un-captured groups differently?
Finally, are RegExes the best way to do this? (I really do feel like I have two problems now.)
Sometimes you want to use parentheses to group parts of an expression together, but you don't want the group to capture anything from the substring it matches. To do this use (?: and ) to enclose the group.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
Parentheses are used for grouping in regular expressions as in arithmetic. They can be used to concatenate regular expressions containing the alternation operator, ' | '. For example, ' @(samp|code)\{[^}]+\} ' matches both ' @code{foo} ' and ' @samp{bar} '.
mc[0].Captures
is equivalent to mc[0].Groups[0].Captures
. Groups[0]
always refers to the whole match, so there will only ever be the one Capture associated with it. The part you're looking for is captured in group #1, so you should be using mc[0].Groups[1].Captures
.
But your regex is designed to match the whole input in one attempt, so the Matches()
method will always return a MatchCollection with only one Match in it (assuming the match is successful). You might as well use Match()
instead:
Match m = Regex.Match(source, jointPattern);
if (m.Success)
{
foreach (Capture c in m.Groups[1].Captures)
{
Console.WriteLine(c.Value);
}
}
output:
1 0.000000E+00 0.975415E+01 0.616921E+01
2 0.000000E+00 0.000000E+00 0.000000E+00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With