Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression to replace everything but specific words

Tags:

python

regex

I am trying to do the following with a regular expression:

import re
x = re.compile('[^(going)|^(you)]')    # words to replace
s = 'I am going home now, thank you.' # string to modify
print re.sub(x, '_', s)

The result I get is:

'_____going__o___no______n__you_'

The result I want is:

'_____going_________________you_'

Since the ^ can only be used inside brackets [], this result makes sense, but I'm not sure how else to go about it.

I even tried '([^g][^o][^i][^n][^g])|([^y][^o][^u])' but it yields '_g_h___y_'.

like image 294
TimY Avatar asked Jul 06 '16 09:07

TimY


1 Answers

Not quite as easy as it first appears, since there is no "not" in REs except ^ inside [ ] which only matches one character (as you found). Here is my solution:

import re

def subit(m):
    stuff, word = m.groups()
    return ("_" * len(stuff)) + word

s = 'I am going home now, thank you.' # string to modify

print re.sub(r'(.+?)(going|you|$)', subit, s)

Gives:

_____going_________________you_

To explain. The RE itself (I always use raw strings) matches one or more of any character (.+) but is non-greedy (?). This is captured in the first parentheses group (the brackets). That is followed by either "going" or "you" or the end-of-line ($).

subit is a function (you can call it anything within reason) which is called for each substitution. A match object is passed, from which we can retrieve the captured groups. The first group we just need the length of, since we are replacing each character with an underscore. The returned string is substituted for that matching the pattern.

like image 50
cdarke Avatar answered Nov 14 '22 20:11

cdarke