Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression question mark operator not working?

import re
str='abc defg'
m1 = re.match(".*(def)?",str)
m2 = re.match(".*(def)",str)
print (m1.group(1),m2.group(1))

The output of the above is:

(None, 'def')

What is going on? Even with a non-greedy repetition operator, the optional capture group (def)? is not matched.

like image 466
ealfonso Avatar asked Jan 02 '13 01:01

ealfonso


1 Answers

Here's what happens when the regex engine tries to match .*(def) against abc defg:

  • First, the engine starts trying to match the regex at the beginning of the string.
  • The greedy subpattern .* initially tries to match as many times as it can, matching the entire string.
  • Since this causes the rest of the match to fail, the regex engine backtracks until it finds a way to match the (def), which happens when the .* matches only abc .

However, if we change the regex to .*(def)?, the following happens instead:

  • First, the regex engine again starts at the beginning of the string.
  • Next, it again tries to match .* as many times as possible, matching the entire string.
  • But at that point, since all the rest of the regex is optional, it has found a match for the entire regex! Since (def)? is greedy, the engine would prefer to match it if it could, but it's not going to backtrack earlier subpatterns just to see if it can. Instead, it just lets the .* gobble up the entire string, leaving nothing for (def)?.

Something similar happens with .*?(def) and .*?(def)?:

  • Again, the engine starts at the beginning of the string.
  • The ungreedy subpattern .*? tries to match as few times as it can, i.e. not at all.
  • At that point, (def) cannot match, but (def)? can. Thus, for (def) the regex engine has to go back and consider longer matches for .*? until it finds one that lets the full pattern match, whereas for (def)? it doesn't have to do that, and so it doesn't.

For more information, see the "Combining RE Pieces" section of the Perl regular expressions manual (which matches the behavior of Python's "Perl-compatible" regular expressions).

like image 61
Ilmari Karonen Avatar answered Oct 31 '22 16:10

Ilmari Karonen