Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove list of words from a list of strings

Sorry if the question is bit confusing. This is similar to this question

I think this the above question is close to what I want, but in Clojure.

There is another question

I need something like this but instead of '[br]' in that question, there is a list of strings that need to be searched and removed.

Hope I made myself clear.

I think that this is due to the fact that strings in python are immutable.

I have a list of noise words that need to be removed from a list of strings.

If I use the list comprehension, I end up searching the same string again and again. So, only "of" gets removed and not "the". So my modified list looks like this

places = ['New York', 'the New York City', 'at Moscow' and many more]

noise_words_list = ['of', 'the', 'in', 'for', 'at']

for place in places:
    stuff = [place.replace(w, "").strip() for w in noise_words_list if place.startswith(w)]

I would like to know as to what mistake I'm doing.

like image 613
prabhu Avatar asked Aug 18 '10 09:08

prabhu


1 Answers

>>> import re
>>> noise_words_list = ['of', 'the', 'in', 'for', 'at']
>>> phrases = ['of New York', 'of the New York']
>>> noise_re = re.compile('\\b(%s)\\W'%('|'.join(map(re.escape,noise_words_list))),re.I)
>>> [noise_re.sub('',p) for p in phrases]
['New York', 'New York']
like image 108
John La Rooy Avatar answered Oct 22 '22 20:10

John La Rooy