Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic Python Regular expression: re.DOTALL *not* matching new lines

I want to turn all occurrences of A...A into B...B for some filler between the two A's. The filler must be allowed to contain new line characters. I assumed re.DOTALL was the solution.

Here's a python script:

import re

tt1 = re.sub(r'A(?P<text>.*)A','B\g<text>B','AhiA')
print tt1 
tt1 = re.sub(r'A(?P<text>.*)A','B\g<text>B','A\nhiA')
print tt1 
tt1 = re.sub(r'A(?P<text>[.]*)A','B\g<text>B','A\nhiA')
print tt1 
tt1 = re.sub(r'A(?P<text>.*)A','B\g<text>B','A\nhiA',re.DOTALL)
print tt1 

And here's the output:

BhiB
A
hiA
A
hiA
A
hiA

What gives, and how can I replace 'A\nhiA' with 'B\nhiB'?

like image 363
Josh Vander Hook Avatar asked Jul 22 '13 17:07

Josh Vander Hook


People also ask

How do you match a new line character in Python?

To match the new line regex in Python, use the pattern \n. On Linux OS, it is \n; on Windows, the line break matches with \r\n, and in the old version of Mac, it is \r.

What is re Dotall in Python?

By using re. DOTALL flag, you can modify the behavior of dot (.) character to match the newline character apart from other characters.

What is \r and \n in regex?

\n. Matches a newline character. \r. Matches a carriage return character.


1 Answers

The fourth parameter to re.sub() is count (the maximum number of replace operations to be performed). re.DOTALL is 16, so you're passing a (valid) parameter in an unexpected place.

Use

re.sub(r'A(?P<text>.*)A','B\g<text>B','A\nhiA', flags=re.DOTALL)

(or place re.DOTALL in position five):

re.sub(r'A(?P<text>.*)A','B\g<text>B','A\nhiA', 0, re.DOTALL)
like image 116
Tim Pietzcker Avatar answered Oct 17 '22 16:10

Tim Pietzcker