I'm a little confused about regular expressions and greedy vs lazy. It's really very simple and it feels like I'm missing something obvious.
I've simplified my problem as much as I can to make it clear. Consider the following string and regex pattern.
string:
aaxxxb
pattern:
(?<=a)(.*?)(?=b)
result:
axxx
what I expected:
xxx
This result is what I would expect from using .* instead of .*?, what am I missing?
Obviously, same thing if I use a.*?b gives me aaxxxb. Why is this? Shouldn't lazy (like .*?) return as few characters as possible?
You are missing the fact that a regex engine works from left to right, position by position, and succeeds as soon as it finds a match at the current position.
In your example, the first position where the pattern succeeds is at the second "a".
The laziness works only on the right side.
If you want to obtain "xxx", a better way is to use a negated character class [^ab]*
instead of .*?
Note: not exactly related to the subject, but good to know: a DFA regex engine will try to get the largest result in case of alternation, a NFA gives you the first that succeeds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With