What is the best way to add/remove stop words with spacy? I am using token.is_stop
function and would like to make some custom changes to the set. I was looking at the documentation but could not find anything regarding of stop words. Thanks!
By default, Spacy has 326 English stopwords, but at times you may like to add your own custom stopwords to the default list. We will show you how in the below example. To add a custom stopword in Spacy, we first load its English language model and use add() method to add stopwords.
To remove a word from the set of stop words in SpaCy, you can pass the word to remove to the remove method of the set. Output: ['Nick', 'play', 'football', ',', 'not', 'fond', '. ']
You need to separate your word lists. One should be for single words and another should be for phrases. And then you need to convert copy_phrase_list to a string and return it. Remove all your for loops and add the following for loop.
Using Spacy 2.0.11, you can update its stopwords set using one of the following:
To add a single stopword:
import spacy nlp = spacy.load("en") nlp.Defaults.stop_words.add("my_new_stopword")
To add several stopwords at once:
import spacy nlp = spacy.load("en") nlp.Defaults.stop_words |= {"my_new_stopword1","my_new_stopword2",}
To remove a single stopword:
import spacy nlp = spacy.load("en") nlp.Defaults.stop_words.remove("whatever")
To remove several stopwords at once:
import spacy nlp = spacy.load("en") nlp.Defaults.stop_words -= {"whatever", "whenever"}
Note: To see the current set of stopwords, use:
print(nlp.Defaults.stop_words)
Update : It was noted in the comments that this fix only affects the current execution. To update the model, you can use the methods nlp.to_disk("/path")
and nlp.from_disk("/path")
(further described at https://spacy.io/usage/saving-loading).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With