Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between re.DOTALL and re.MULTILINE? [duplicate]

Tags:

python

regex

When matching an expression on multiple lines, I always used re.DOTALL and it worked OK. Now I stumbled across the re.MULTILINE string, and it looks like it's doing the same thing.

From the re module (doesn't make it clearer, but the values are different):

M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline

SRE_FLAG_MULTILINE = 8 # treat target as multiline string
SRE_FLAG_DOTALL = 16 # treat target as a single string

So is there a difference in the usage, and what is the subtle cases where it could return something different?

like image 454
Jean-François Fabre Avatar asked Jan 12 '17 18:01

Jean-François Fabre


People also ask

What is re Dotall?

By using re. DOTALL flag, you can modify the behavior of dot (.) character to match the newline character apart from other characters. Before using the DOTALL flag, let's look into how regular engine responds to the newline character. Python3.

What does re multiline do?

The re. MULTILINE search modifier forces the ^ symbol to match at the beginning of each line of text (and not just the first), and the $ symbol to match at the end of each line of text (and not just the last one). The re. MULTILINE search modifier takes no arguments.

What is re compile?

Python's re. compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re. match() or re.search() .

What is a pattern re?

A Regular Expression (RE) in a programming language is a special text string used for describing a search pattern.


1 Answers

They are quite different. Yes, both affect how newlines are treated, but they switch behaviour for different concepts.

  • re.MULTILINE affects where ^ and $ anchors match.

    Without the switch, ^ and $ match only at the start and end, respectively, of the whole text. With the switch, they also match just before or after a newline:

    >>> import re
    >>> re.search('foo$', 'foo\nbar') is None  # no match
    True
    >>> re.search('foo$', 'foo\nbar', flags=re.MULTILINE)
    <_sre.SRE_Match object; span=(0, 3), match='foo'>
    
  • re.DOTALL affects what the . pattern can match.

    Without the switch, . matches any character except a newline. With the switch, newlines are matched as well:

    >>> re.search('foo.', 'foo\nbar') is None  # no match
    True
    >>> re.search('foo.', 'foo\nbar', flags=re.DOTALL)
    <_sre.SRE_Match object; span=(0, 4), match='foo\n'>
    
like image 166
Martijn Pieters Avatar answered Sep 30 '22 16:09

Martijn Pieters