I am trying to remove words from a string if they match a list.
x = "How I Met Your Mother 7x17 (HDTV-LOL) [VTV] - Mon, 20 Feb 2012"
tags = ['HDTV', 'LOL', 'VTV', 'x264', 'DIMENSION', 'XviD', '720P', 'IMMERSE']
print x
for tag in tags:
if tag in x:
print x.replace(tag, '')
It produces this output:
How I Met Your Mother 7x17 (HDTV-LOL) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (-LOL) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (HDTV-) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (HDTV-LOL) [] - Mon, 20 Feb 2012
I want it to remove all the words matching the list.
You are not keeping the result of x.replace()
. Try the following instead:
for tag in tags:
x = x.replace(tag, '')
print x
Note that your approach matches any substring, and not just full words. For example, it would remove the LOL
in RUN LOLA RUN
.
One way to address this would be to enclose each tag in a pair of r'\b'
strings, and look for the resulting regular expression. The r'\b'
would only match at word boundaries:
for tag in tags:
x = re.sub(r'\b' + tag + r'\b', '', x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With