Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.sub() beginning-of-line anchoring

Consider the following multiline string:

>> print s
shall i compare thee to a summer's day?
thou art more lovely and more temperate
rough winds do shake the darling buds of may,
and summer's lease hath all too short a date.

re.sub() replaces all the occurrence of and with AND:

>>> print re.sub("and", "AND", s)
shall i compare thee to a summer's day?
thou art more lovely AND more temperate
rough winds do shake the darling buds of may,
AND summer's lease hath all too short a date.

But re.sub() doesn't allow ^ anchoring to the beginning of the line, so adding it causes no occurrence of and to be replaced:

>>> print re.sub("^and", "AND", s)
shall i compare thee to a summer's day?
thou art more lovely and more temperate
rough winds do shake the darling buds of may,
and summer's lease hath all too short a date.

How can I use re.sub() with start-of-line (^) or end-of-line ($) anchors?

like image 524
Adam Matan Avatar asked Jul 15 '13 07:07

Adam Matan


People also ask

How do you match a pattern exactly at the beginning in Python?

match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.

How does re sub work in Python?

sub() function belongs to the Regular Expressions ( re ) module in Python. It returns a string where all matching occurrences of the specified pattern are replaced by the replace string. To use this function, we need to import the re module first.

What Matches start of string in Python?

Summary: The caret operator ^ matches at the beginning of a string. The dollar-sign operator $ matches at the end of a string. If you want to match at the beginning or end of each line in a multi-line string, you can set the re.

How do you stop a regular expression in Python?

You can use negative character sets, or [^things to not match] . In this case, you want to not match | , so you would have [^|] .


2 Answers

You forgot to enable multiline mode.

re.sub("^and", "AND", s, flags=re.M)

re.M
re.MULTILINE

When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

source

The flags argument isn't available for python older than 2.7; so in those cases you can set it directly in the regular expression like so:

re.sub("(?m)^and", "AND", s)
like image 152
Ignacio Vazquez-Abrams Avatar answered Sep 30 '22 04:09

Ignacio Vazquez-Abrams


Add (?m) for multiline:

print re.sub(r'(?m)^and', 'AND', s)

See the re documentation here.

like image 34
RichieHindle Avatar answered Sep 30 '22 06:09

RichieHindle