I am trying to remove words from a string if they match a list.
x = "How I Met Your Mother 7x17 (HDTV-LOL) [VTV] - Mon, 20 Feb 2012"
tags = ['HDTV', 'LOL', 'VTV', 'x264', 'DIMENSION', 'XviD', '720P', 'IMMERSE']
print x
for tag in tags:
if tag in x:
print x.replace(tag, '')
It produces this output:
How I Met Your Mother 7x17 (HDTV-LOL) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (-LOL) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (HDTV-) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (HDTV-LOL) [] - Mon, 20 Feb 2012
I want it to remove all the words matching the list.
You are not keeping the result of x.replace(). Try the following instead:
for tag in tags:
x = x.replace(tag, '')
print x
Note that your approach matches any substring, and not just full words. For example, it would remove the LOL in RUN LOLA RUN.
One way to address this would be to enclose each tag in a pair of r'\b' strings, and look for the resulting regular expression. The r'\b' would only match at word boundaries:
for tag in tags:
x = re.sub(r'\b' + tag + r'\b', '', x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With