I thought that by default my Regex would exhibit the greedy behavior that I want, but it is not in the following code:
Regex keywords = new Regex(@"in|int|into|internal|interface");
var targets = keywords.ToString().Split('|');
foreach (string t in targets)
{
Match match = keywords.Match(t);
Console.WriteLine("Matched {0,-9} with {1}", t, match.Value);
}
Output:
Matched in with in
Matched int with in
Matched into with in
Matched internal with in
Matched interface with in
Now I realize that I could get it to work for this small example if I simply sorted the keywords by length descending, but
So my question is: Why is this being lazy and how do I fix it?
Laziness and greediness applies to quantifiers only (?
, *
, +
, {min,max}
). Alternations always match in order and try the first possible match.
It looks like you're trying to word break things. To do that you need the entire expression to be correct, your current one is not. Try this one instead..
new Regex(@"\b(in|int|into|internal|interface)\b");
The "\b" says to match word boundaries, and is a zero-width match. This is locale dependent behavior, but in general this means whitespace and punctuation. Being a zero width match it will not contain the character that caused the regex engine to detect the word boundary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With