python regex match and replace beginning and end of string but keep the middle

Question

I have a dataframe with holiday names. I have a problem that on some days, holidays are observed on different days, sometimes on the day of another holiday. Here are some example problems:

1  "Independence Day (Observed)"
2  "Christmas Eve, Christmas Day (Observed)"
3  "New Year's Eve, New Year's Day (Observed)"
4  "Martin Luther King, Jr. Day"

I want to replace all ' (Observed)' with '' and everything before a comma only if ' (Observed)' is matched. Output should be:

1  "Independence Day"
2  "Christmas Day"
3  "New Year's Day"
4  "Martin Luther King, Jr. Day"

I was able to do both independently:

(foo['holiday']
 .replace(to_replace=' $Observed$', value='', regex=True)
 .replace(to_replace='.+, ', value='', regex=True))

but that caused a problem with 'Martin Luther King, Jr. Day'.

Chris Nauroth · Accepted Answer

replace.py

import re

input = [
    "Independence Day (Observed)",
    "Christmas Eve, Christmas Day (Observed)",
    "New Year's Eve, New Year's Day (Observed)",
    "Martin Luther King, Jr. Day"
]

for holiday in input:
    print re.sub('^(.*?, )?(.*?)( $Observed$)$', '\2', holiday)

Output

> python replace.py 
Independence Day
Christmas Day
New Year's Day
Martin Luther King, Jr. Day

Explanation

^: Match at start of string.
(.*?, )?: Match anything followed by a command and a space. Make it a lazy match, so it doesn't consume the portion of the string we want to keep. The last ? makes the whole thing optional, because some of the sample input doesn't have a comma at all.
(.*?): Grab the part we want for later use in a capturing group. This part is also a lazy match because...
( $Observed$): Some strings might have " (Observed)" on the end, so we declare that in a separate group here. The lazy match in the prior piece won't consume this.
$: Match at end of string.

Wiktor Stribiżew · Answer

I suggest

r'^(?:.*,\s*)?\b([^,]+)\s+$Observed$.*'

Replace with r'\1' backreference.

See the regex demo.

Pattern details:

^ - start of string
(?:.*,\s*)? - an optional sequence of:
- .*, - any 0+ chars other than line break chars as many as possible, up to the last occurrence of , on the line and then the ,
- \s* - 0 or more whitespaces
\b - a word boundary
([^,]+) - 1 or more chars other than ,
\s+ - 1 or more whitespaces
$Observed$ - a literal substring (Observed)
.* - any 0+ chars other than line break chars as many as possible up to the line end.

python regex match and replace beginning and end of string but keep the middle

Tags:

python

regex

replace

pandas

PL3

2 Answers

replace.py

Output

Explanation

Chris Nauroth

Wiktor Stribiżew

Recent Activity

Donate For Us

python regex match and replace beginning and end of string but keep the middle

Tags:

python

regex

replace

pandas

PL3

2 Answers

replace.py

Output

Explanation

Chris Nauroth

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us