I have a list that contains many sentences. I want to iterate through the list, removing from all sentences words like "and", "the", "a", "are", etc.
I tried this:
def removearticles(text):
articles = {'a': '', 'an':'', 'and':'', 'the':''}
for i, j in articles.iteritems():
text = text.replace(i, j)
return text
As you can probably tell, however, this will remove "a" and "an" when it appears in the middle of the word. I need to remove only the instances of the words when they are delimited by blank space, and not when they are within a word. What is the most efficient way of going about this?
Remove All Occurrences of a Character From a String in Python Using the translate() Method. We can also use the translate() method to remove characters from a string. The translate() method, when invoked on a string, takes a translation table as an input argument.
Python Remove Character from String using translate() Python string translate() function replace each character in the string using the given translation table. We have to specify the Unicode code point for the character and 'None' as a replacement to remove it from the result string.
Using translate(): translate() is another method that can be used to remove a character from a string in Python. translate() returns a string after removing the values passed in the table. Also, remember that to remove a character from a string using translate() you have to replace it with None and not "" .
Use str. replace() to remove multiple characters from a string.
I would go for regex, something like:
def removearticles(text):
re.sub('(\s+)(a|an|and|the)(\s+)', '\1\3', text)
or if you want to remove the leading whitespace as well:
def removearticles(text):
re.sub('\s+(a|an|and|the)(\s+)', '\2', text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With