I have a .net regex which I am testing using Windows Powershell. The output is as follows:
> [System.Text.RegularExpressions.Regex]::Match("aaa aaa bbb", "aaa.*?bbb")
Groups : {aaa aaa bbb}
Success : True
Captures : {aaa aaa bbb}
Index : 0
Length : 11
Value : aaa aaa bbb
My expectation was that using the ?
quantifier would cause the match to be aaa bbb
, as the second group of a's is sufficient to satisfy the expression. Is my understanding of non-greedy quantifiers flawed, or am I testing incorrectly?
Note: this is plainly not the same problem as Regular Expression nongreedy is greedy
You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".
The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.
A non-greedy match means that the regex engine matches as few characters as possible—so that it still can match the pattern in the given string.
In general, the regex engine will try to match as many input characters as possible once it encounters a quantified token like \d+ or, in our case, . * . That behavior is called greedy matching because the engine will eagerly attempt to match anything it can.
This is a common misunderstanding. Lazy quantifiers do not guarantee the shortest possible match. They only make sure that the current quantifier, from the current position, does not match more characters than needed for an overall match.
If you truly want to ensure the shortest possible match, you need to make that explicit. In this case, this means that instead of .*?
, you want a subregex that matches anything that is neither aaa
nor bbb
. The resulting regex will therefore be
aaa(?:(?!aaa|bbb).)*bbb
Compare the result for the string aaa aaa bbb bbb
:
regex: aaa.*?bbb
result: aaa aaa bbb
regex: aaa.*bbb
result: aaa aaa bbb bbb
The regex engine finds first occurrence of aaa
and then skips all characters (.*?
) until first occurrence of bbb
, but for the greedy operator (.*
) it will go on to find a larger result and therefore match the last occurrence of bbb
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With