Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add/remove custom stop words with spacy

What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custom changes to the set. I was looking at the documentation but could not find anything regarding of stop words. Thanks!

like image 265
E.K. Avatar asked Dec 15 '16 18:12

E.K.


People also ask

How do I add custom stop words to Spacy?

By default, Spacy has 326 English stopwords, but at times you may like to add your own custom stopwords to the default list. We will show you how in the below example. To add a custom stopword in Spacy, we first load its English language model and use add() method to add stopwords.

How do I remove stop words using Spacy?

To remove a word from the set of stop words in SpaCy, you can pass the word to remove to the remove method of the set. Output: ['Nick', 'play', 'football', ',', 'not', 'fond', '. ']

How do I get rid of custom stop words?

You need to separate your word lists. One should be for single words and another should be for phrases. And then you need to convert copy_phrase_list to a string and return it. Remove all your for loops and add the following for loop.


1 Answers

Using Spacy 2.0.11, you can update its stopwords set using one of the following:

To add a single stopword:

import spacy     nlp = spacy.load("en") nlp.Defaults.stop_words.add("my_new_stopword") 

To add several stopwords at once:

import spacy     nlp = spacy.load("en") nlp.Defaults.stop_words |= {"my_new_stopword1","my_new_stopword2",} 

To remove a single stopword:

import spacy     nlp = spacy.load("en") nlp.Defaults.stop_words.remove("whatever") 

To remove several stopwords at once:

import spacy     nlp = spacy.load("en") nlp.Defaults.stop_words -= {"whatever", "whenever"} 

Note: To see the current set of stopwords, use:

print(nlp.Defaults.stop_words) 

Update : It was noted in the comments that this fix only affects the current execution. To update the model, you can use the methods nlp.to_disk("/path") and nlp.from_disk("/path") (further described at https://spacy.io/usage/saving-loading).

like image 139
Romain Avatar answered Oct 17 '22 00:10

Romain