Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace multiple substrings in a Pandas series using a dictionary?

Tags:

I have a Pandas series of strings. I want to make multiple replacements to multiple substrings per row, see:

testdf = pd.Series([
    'Mary went to school today',
    'John went to hospital today'
])
to_sub = {
    'Mary': 'Alice',
    'school': 'hospital',
    'today': 'yesterday',
    'tal': 'zzz',
}
testdf = testdf.replace(to_sub, regex=True)  # does not work (only replaces one instance per row)
print(testdf)

In the above case, the desired output is:

Alice went to hospital yesterday.
John went to hospizzz yesterday.

where note the first row had three substitutions from the dictionary.

How can I perform this efficiently apart from doing this row by row (in a for loop)?

I tried df.replace(...) as many other answers in other questions but that only replaces a single substring, the result is like: Alice went to school today, where school and today weren't substituted..

Another thing to note is that the substitutions should happen all at once for any single row. (see the hospital in the first row isn't substituted a second time to hospizzz which would be wrong).

like image 931
ksgj1 Avatar asked Mar 02 '19 12:03

ksgj1


1 Answers

You can use:

#Borrowed from an external website
def multipleReplace(text, wordDict):
    for key in wordDict:
        text = text.replace(key, wordDict[key])
    return text

print(testdf.apply(lambda x: multipleReplace(x,to_sub)))

0    Alice went to hospital yesterday
1     John went to hospital yesterday

EDIT

Using the dictionary as below mentioned comments:

to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz'
}

testdf.apply(lambda x: ' '.join([to_sub.get(i, i) for i in x.split()]))

Outputs:

0    Alice went to hospital yesterday
1     John went to hospital yesterday
like image 100
anky Avatar answered Nov 15 '22 04:11

anky