Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace more than one pattern python

I have reviewed various links but all showed how to replace multiple words in one pass. However, instead of words I want to replace patterns e.g.

RT @amrightnow: "The Real Trump" Trump About You" Watch Make #1 https:\/\/t.co\/j58e8aacrE #tcot #pjnet #1A #2A #Tru mp #trump2016 https:\/\/t.co\u2026

When I perform the following two commands on the above text I get the desired output

result = re.sub(r"http\S+","",sent)
result1 = re.sub(r"@\S+","",result)

This way I am removing all the urls and @(handlers from the tweet). The output will be something like follows:

>>> result1
'RT  "The Real Trump" Trump About You" Watch Make #1  #tcot #pjnet #1A #2A #Trump #trump2016 '

Could someone let me know what is the best way to do it? I will be basically reading tweets from a file. I want to read each tweet and replace these handlers and urls with blanks.

like image 984
user1122534 Avatar asked Oct 28 '25 15:10

user1122534


1 Answers

You need the regex "or" operator which is the pipe |:

re.sub(r"http\S+|@\S+","",sent)

If you have a long list of patterns that you want to remove, a common trick is to use join to create the regular expression:

to_match = ['http\S+',
            '@\S+',
            'something_else_you_might_want_to_remove']

re.sub('|'.join(to_match), '', sent)
like image 77
maxymoo Avatar answered Oct 31 '25 04:10

maxymoo