Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing multiple phrases from string column efficiently

I want to remove the few words in a column and I have written below code which is working fine

finaldata['keyword'] = finaldata['keyword'].str.replace("Washington Times", "")
finaldata['keyword'] = finaldata['keyword'].str.replace("Washington Post", "")
finaldata['keyword'] = finaldata['keyword'].str.replace("Mail The Globe", "")

Now I have around 30 words to remove but I can't repeat this line of code 30 times Is there any way to solve my issue if yes please guide me

like image 442
Rahul Varma Avatar asked Dec 19 '18 04:12

Rahul Varma


1 Answers

You can use regex here and reduce this to a single replace call.

words = ["Washington Times", "Washington Post", "Mail The Globe"]
p = '|'.join(words)

finaldata['keyword'] = finaldata['keyword'].str.replace(p, '')

For performance, if the data has no NaNs, you should consider using a list comprehension.

import re

p2 = re.compile(p)
finaldata['keyword'] = [p2.replace('', text) for text in finaldata['keyword']]

If there are NaNs, you can use select and use loc to reassign:

m = finaldata['keyword'].notna()
finaldata.loc[m, 'keyword'] = [
    p2.replace('', text) for text in finaldata.loc[m, 'keyword'].tolist()]
like image 65
cs95 Avatar answered Oct 13 '22 18:10

cs95