I'm trying to match and remove all words in a list from a string using a compiled regex but I'm struggling to avoid occurrences within words.
Current:
REMOVE_LIST = ["a", "an", "as", "at", ...]
remove = '|'.join(REMOVE_LIST)
regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
out = regex.sub("", text)
In: "The quick brown fox jumped over an ant"
Out: "quick brown fox jumped over t"
Expected: "quick brown fox jumped over"
I've tried changing the string to compile to the following but to no avail:
regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)
Any suggestions or am I missing something garishly obvious?
Method 3 : Using remove() In this method, we iterate through each item in the list, and when we find a match for the item to be removed, we will call remove() function on the list.
Python Remove Character from String using replace() We can use string replace() function to replace a character with a new character. If we provide an empty string as the second argument, then the character will get removed from the string.
The remove() Method Removes the First Occurrence of an Item in a List. A thing to keep in mind when using the remove() method is that it will search for and will remove only the first instance of an item.
here is a suggestion without using regex you may want to consider:
>>> sentence = 'word1 word2 word3 word1 word2 word4'
>>> remove_list = ['word1', 'word2']
>>> word_list = sentence.split()
>>> ' '.join([i for i in word_list if i not in remove_list])
'word3 word4'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With