In .NET, regex is not organizing captures as I would expect. (I won't call this a bug, because obviously someone intended it. However, it's not how I'd expect it to work nor do I find it helpful.)
This regex is for recipe ingredients (simplified for sake of example):
(?<measurement> # begin group
\s* # optional beginning space or group separator
(
(?<integer>\d+)| # integer
(
(?<numtor>\d+) # numerator
/
(?<dentor>[1-9]\d*) # denominator. 0 not allowed
)
)
\s(?<unit>[a-zA-Z]+)
)+ # end group. can have multiple
My string: 3 tbsp 1/2 tsp
Resulting groups and captures:
[measurement][0]=3 tbsp
[measurement][1]= 1/2 tsp
[integer][0]=3
[numtor][0]=1
[dentor][0]=2
[unit][0]=tbsp
[unit][1]=tsp
Notice how even though 1/2 tsp
is in the 2nd Capture, it's parts are in [0]
since these spots were previously unused.
Is there any way to get all of the parts to have predictable useful indexes without having to re-run each group through the regex again?
Is there any way to get all of the parts to have predictable useful indexes without having to re-run each group through the regex again?
Not with Captures. And if you're going to perform multiple matches anyway, I suggest you remove the +
and match each component of the measurement separately, like so:
string s = @"3 tbsp 1/2 tsp";
Regex r = new Regex(@"\G\s* # anchor to end of previous match
(?<measurement> # begin group
(
(?<integer>\d+) # integer
|
(
(?<numtor>\d+) # numerator
/
(?<dentor>[1-9]\d*) # denominator. 0 not allowed
)
)
\s+(?<unit>[a-zA-Z]+)
) # end group.
", RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);
foreach (Match m in r.Matches(s))
{
for (int i = 1; i < m.Groups.Count; i++)
{
Group g = m.Groups[i];
if (g.Success)
{
Console.WriteLine("[{0}] = {1}", r.GroupNameFromNumber(i), g.Value);
}
}
Console.WriteLine("");
}
output:
[measurement] = 3 tbsp
[integer] = 3
[unit] = tbsp
[measurement] = 1/2 tsp
[numtor] = 1
[dentor] = 2
[unit] = tsp
The \G
at the beginning ensures that matches occur only at the point where the previous match ended (or at the beginning of the input if this is the first match attempt). You can also save the match-end position between calls, then use the two-argument Matches
method to resume parsing at that same point (as if that were really the beginning of the input).
Seems like you probably need to loop through the input, matching one measurement at a time. Then you would have predictable access to the parts of that measurement, during the loop iteration for that measurement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With