Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex with *?

Tags:

python

regex

What does this Python regex match?

.*?[^\\]\n

I'm confused about why the . is followed by both * and ?.

like image 829
Mika H. Avatar asked Oct 16 '25 22:10

Mika H.


1 Answers

* means "match the previous element as many times as possible (zero or more times)".

*? means "match the previous element as few times as possible (zero or more times)".

The other answers already address this, but what they don't bring up is how it changes the regex, well if the re.DOTALL flag is provided it makes a huge difference, because . will match line break characters with that enabled. So .*[^\\]\n would match from the beginning of the string all the way to the last newline character that is not preceeded by a backslash (so several lines would match).

If the re.DOTALL flag is not provided, the difference is more subtle, [^\\] will match everything other than backslash, including line break characters. Consider the following example:

>>> import re
>>> s = "foo\n\nbar"
>>> re.findall(r'.*?[^\\]\n', s)
['foo\n']
>>> re.findall(r'.*[^\\]\n', s)
['foo\n\n']

So the purpose of this regex is to find non-empty lines that don't end with a backslash, but if you use .* instead of .*? you will match an extra \n if you have an empty line following a non-empty line.

This happens because .*? will only match fo, [^\\] will match the second o, and the the \n matches at the end of the first line. However the .* will match foo, the [^\\] will match the \n to end the first line, and the next \n will match because the second line is blank.

like image 66
Andrew Clark Avatar answered Oct 18 '25 12:10

Andrew Clark