I'm using Python 2.7.0 and and doing the following in the interpreter:
>>> re.search (r"//\s*.*?$", "//\n\na12345678", flags=re.MULTILINE|re.DOTALL).group()
'//\n\na12345678'
This is not what I expected. I though $ would match before the endline, but it included the two endline characters AND text after that?
Surprisingly, this works:
>>> re.search (r"//\s*.*?$", "//1\n\na12345678", flags=re.MULTILINE|re.DOTALL).group()
'//1'
What am I misunderstanding here about python regular expressions?
Some more info:
>>> re.search(r"//\s*.*", "//\n test").group()
'//\n test'
>>> re.search(r"//\s*.*", "//1\n test").group()
'//1'
This last block of code is without MUTLILINE and DOTALL? What am I misunderstanding here? .* shouldn't be matching the newline, and definitely not go past it, right?
\s can match newlines, and when you use the re.DOTALL flag . can also match newlines.
In the first case your \s* is greedy, so since the first characters after the // in your string are newlines they will be matched by the \s*, and then the .*? will match the final line so that the $ can match at the very end of the string.
In the second case the \s* cannot match because of the 1 after the //, and the .*? will only match up to just before the first newline since it is lazy.
If you want to match all whitespace except for newlines, you can use [ \t] in place of \s. It actually looks like for your examples you will get the expected behavior if you just use the regex //.*?$ with the re.MULTILINE flag enabled (re.DOTALL can be included as well, it will not make a difference in this case).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With