I have some code that removes stop words from my data set, as the stop list doesn't seem to remove a majority of the words I would like it too, I'm looking to add words to this stop list so that it will remove them for this case. The code i'm using to remove stop words is:
word_list2 = [w.strip() for w in word_list if w.strip() not in nltk.corpus.stopwords.words('english')]
I'm unsure of the correct syntax for adding words and can't seem to find the correct one anywhere. Any help is appreciated. Thanks.
Stop words are words that are so common they are basically ignored by typical tokenizers. By default, NLTK (Natural Language Toolkit) includes a list of 40 stop words, including: “a”, “an”, “the”, “of”, “in”, etc. The stopwords in nltk are the most common words in data.
By default, Spacy has 326 English stopwords, but at times you may like to add your own custom stopwords to the default list. We will show you how in the below example. To add a custom stopword in Spacy, we first load its English language model and use add() method to add stopwords.
We are using “|” symbol to add these 2 Stop Words because in python | Symbol acts as a Union Set Operator. Means, If these 2 words are not present in the list then and only then they will be added to stop words list otherwise they will be discarded.
To add a word to NLTK stop words collection, first create an object from the stopwords. words('english') list. Next, use the append() method on the list to add any word to the list. The following script adds the word play to the NLTK stop word collection.
You can simply use the append method to add words to it:
stopwords = nltk.corpus.stopwords.words('english')
stopwords.append('newWord')
or extend to append a list of words, as suggested by Charlie on the comments.
stopwords = nltk.corpus.stopwords.words('english')
newStopWords = ['stopWord1','stopWord2']
stopwords.extend(newStopWords)
import nltk
stopwords = nltk.corpus.stopwords.words('english')
new_words=('re','name', 'user', 'ct')
for i in new_words:
stopwords.append(i)
print(stopwords)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With