When matching an expression on multiple lines, I always used re.DOTALL
and it worked OK. Now I stumbled across the re.MULTILINE
string, and it looks like it's doing the same thing.
From the re
module (doesn't make it clearer, but the values are different):
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
SRE_FLAG_MULTILINE = 8 # treat target as multiline string
SRE_FLAG_DOTALL = 16 # treat target as a single string
So is there a difference in the usage, and what is the subtle cases where it could return something different?
By using re. DOTALL flag, you can modify the behavior of dot (.) character to match the newline character apart from other characters. Before using the DOTALL flag, let's look into how regular engine responds to the newline character. Python3.
The re. MULTILINE search modifier forces the ^ symbol to match at the beginning of each line of text (and not just the first), and the $ symbol to match at the end of each line of text (and not just the last one). The re. MULTILINE search modifier takes no arguments.
Python's re. compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re. match() or re.search() .
A Regular Expression (RE) in a programming language is a special text string used for describing a search pattern.
They are quite different. Yes, both affect how newlines are treated, but they switch behaviour for different concepts.
re.MULTILINE
affects where ^
and $
anchors match.
Without the switch, ^
and $
match only at the start and end, respectively, of the whole text. With the switch, they also match just before or after a newline:
>>> import re
>>> re.search('foo$', 'foo\nbar') is None # no match
True
>>> re.search('foo$', 'foo\nbar', flags=re.MULTILINE)
<_sre.SRE_Match object; span=(0, 3), match='foo'>
re.DOTALL
affects what the .
pattern can match.
Without the switch, .
matches any character except a newline. With the switch, newlines are matched as well:
>>> re.search('foo.', 'foo\nbar') is None # no match
True
>>> re.search('foo.', 'foo\nbar', flags=re.DOTALL)
<_sre.SRE_Match object; span=(0, 4), match='foo\n'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With