Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding words to nltk stoplist

I have some code that removes stop words from my data set, as the stop list doesn't seem to remove a majority of the words I would like it too, I'm looking to add words to this stop list so that it will remove them for this case. The code i'm using to remove stop words is:

word_list2 = [w.strip() for w in word_list if w.strip() not in nltk.corpus.stopwords.words('english')]

I'm unsure of the correct syntax for adding words and can't seem to find the correct one anywhere. Any help is appreciated. Thanks.

like image 656
Alex Avatar asked Apr 01 '11 09:04

Alex


People also ask

What words are in NLTK Stopwords?

Stop words are words that are so common they are basically ignored by typical tokenizers. By default, NLTK (Natural Language Toolkit) includes a list of 40 stop words, including: “a”, “an”, “the”, “of”, “in”, etc. The stopwords in nltk are the most common words in data.

How do I add custom stop words to Spacy?

By default, Spacy has 326 English stopwords, but at times you may like to add your own custom stopwords to the default list. We will show you how in the below example. To add a custom stopword in Spacy, we first load its English language model and use add() method to add stopwords.

How do you add stop words list in Python?

We are using “|” symbol to add these 2 Stop Words because in python | Symbol acts as a Union Set Operator. Means, If these 2 words are not present in the list then and only then they will be added to stop words list otherwise they will be discarded.

Which function is used to add an extra stop work to NLTK English Stopwords?

To add a word to NLTK stop words collection, first create an object from the stopwords. words('english') list. Next, use the append() method on the list to add any word to the list. The following script adds the word play to the NLTK stop word collection.


2 Answers

You can simply use the append method to add words to it:

stopwords = nltk.corpus.stopwords.words('english')
stopwords.append('newWord')

or extend to append a list of words, as suggested by Charlie on the comments.

stopwords = nltk.corpus.stopwords.words('english')
newStopWords = ['stopWord1','stopWord2']
stopwords.extend(newStopWords)
like image 141
Oziel Carneiro Avatar answered Oct 24 '22 18:10

Oziel Carneiro


import nltk
stopwords = nltk.corpus.stopwords.words('english')
new_words=('re','name', 'user', 'ct')
for i in new_words:
    stopwords.append(i)
print(stopwords)
like image 23
svp Avatar answered Oct 24 '22 20:10

svp