Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression matching using $

Tags:

python

I'm using Python 2.7.0 and and doing the following in the interpreter:

>>> re.search (r"//\s*.*?$", "//\n\na12345678", flags=re.MULTILINE|re.DOTALL).group()
'//\n\na12345678'

This is not what I expected. I though $ would match before the endline, but it included the two endline characters AND text after that?

Surprisingly, this works:

>>> re.search (r"//\s*.*?$", "//1\n\na12345678", flags=re.MULTILINE|re.DOTALL).group()
'//1'

What am I misunderstanding here about python regular expressions?

Some more info:

>>> re.search(r"//\s*.*", "//\n  test").group()
'//\n  test'
>>> re.search(r"//\s*.*", "//1\n  test").group()
'//1'

This last block of code is without MUTLILINE and DOTALL? What am I misunderstanding here? .* shouldn't be matching the newline, and definitely not go past it, right?

like image 782
user2533302 Avatar asked Jan 28 '26 14:01

user2533302


1 Answers

\s can match newlines, and when you use the re.DOTALL flag . can also match newlines.

In the first case your \s* is greedy, so since the first characters after the // in your string are newlines they will be matched by the \s*, and then the .*? will match the final line so that the $ can match at the very end of the string.

In the second case the \s* cannot match because of the 1 after the //, and the .*? will only match up to just before the first newline since it is lazy.

If you want to match all whitespace except for newlines, you can use [ \t] in place of \s. It actually looks like for your examples you will get the expected behavior if you just use the regex //.*?$ with the re.MULTILINE flag enabled (re.DOTALL can be included as well, it will not make a difference in this case).

like image 125
Andrew Clark Avatar answered Jan 30 '26 06:01

Andrew Clark



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!