I have a Pandas series of strings. I want to make multiple replacements to multiple substrings per row, see:
testdf = pd.Series([
'Mary went to school today',
'John went to hospital today'
])
to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz',
}
testdf = testdf.replace(to_sub, regex=True) # does not work (only replaces one instance per row)
print(testdf)
In the above case, the desired output is:
Alice went to hospital yesterday.
John went to hospizzz yesterday.
where note the first row had three substitutions from the dictionary.
How can I perform this efficiently apart from doing this row by row (in a for loop)?
I tried df.replace(...)
as many other answers in other questions but that only replaces a single substring, the result is like: Alice went to school today
, where school
and today
weren't substituted..
Another thing to note is that the substitutions should happen all at once for any single row. (see the hospital
in the first row isn't substituted a second time to hospizzz
which would be wrong).
You can use:
#Borrowed from an external website
def multipleReplace(text, wordDict):
for key in wordDict:
text = text.replace(key, wordDict[key])
return text
print(testdf.apply(lambda x: multipleReplace(x,to_sub)))
0 Alice went to hospital yesterday
1 John went to hospital yesterday
EDIT
Using the dictionary as below mentioned comments:
to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz'
}
testdf.apply(lambda x: ' '.join([to_sub.get(i, i) for i in x.split()]))
Outputs:
0 Alice went to hospital yesterday
1 John went to hospital yesterday
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With