Python regular expression to replace everything but specific words

Question

I am trying to do the following with a regular expression:

import re
x = re.compile('[^(going)|^(you)]')    # words to replace
s = 'I am going home now, thank you.' # string to modify
print re.sub(x, '_', s)

The result I get is:

'_____going__o___no______n__you_'

The result I want is:

'_____going_________________you_'

Since the ^ can only be used inside brackets [], this result makes sense, but I'm not sure how else to go about it.

I even tried '([^g][^o][^i][^n][^g])|([^y][^o][^u])' but it yields '_g_h___y_'.

cdarke · Accepted Answer

Not quite as easy as it first appears, since there is no "not" in REs except ^ inside [ ] which only matches one character (as you found). Here is my solution:

import re

def subit(m):
    stuff, word = m.groups()
    return ("_" * len(stuff)) + word

s = 'I am going home now, thank you.' # string to modify

print re.sub(r'(.+?)(going|you|$)', subit, s)

Gives:

_____going_________________you_

To explain. The RE itself (I always use raw strings) matches one or more of any character (.+) but is non-greedy (?). This is captured in the first parentheses group (the brackets). That is followed by either "going" or "you" or the end-of-line ($).

subit is a function (you can call it anything within reason) which is called for each substitution. A match object is passed, from which we can retrieve the captured groups. The first group we just need the length of, since we are replacing each character with an underscore. The returned string is substituted for that matching the pattern.

Python regular expression to replace everything but specific words

Tags:

python

regex

TimY

1 Answers

cdarke

Recent Activity

Donate For Us

Python regular expression to replace everything but specific words

Tags:

python

regex

TimY

1 Answers

cdarke

Related questions

Recent Activity

Donate For Us