import re
str='abc defg'
m1 = re.match(".*(def)?",str)
m2 = re.match(".*(def)",str)
print (m1.group(1),m2.group(1))
The output of the above is:
(None, 'def')
What is going on? Even with a non-greedy repetition operator, the optional capture group (def)?
is not matched.
Here's what happens when the regex engine tries to match .*(def)
against abc defg
:
.*
initially tries to match as many times as it can, matching the entire string.(def)
, which happens when the .*
matches only abc
.However, if we change the regex to .*(def)?
, the following happens instead:
.*
as many times as possible, matching the entire string.(def)?
is greedy, the engine would prefer to match it if it could, but it's not going to backtrack earlier subpatterns just to see if it can. Instead, it just lets the .*
gobble up the entire string, leaving nothing for (def)?
.Something similar happens with .*?(def)
and .*?(def)?
:
.*?
tries to match as few times as it can, i.e. not at all.(def)
cannot match, but (def)?
can. Thus, for (def)
the regex engine has to go back and consider longer matches for .*?
until it finds one that lets the full pattern match, whereas for (def)?
it doesn't have to do that, and so it doesn't.For more information, see the "Combining RE Pieces" section of the Perl regular expressions manual (which matches the behavior of Python's "Perl-compatible" regular expressions).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With