Adding words to scikit-learn's CountVectorizer's stop list

1 Answers

According to the source code for sklearn.feature_extraction.text, the full list (actually a frozenset, from stop_words) of ENGLISH_STOP_WORDS is exposed through __all__. Therefore if you want to use that list plus some more items, you could do something like:

from sklearn.feature_extraction import text   stop_words = text.ENGLISH_STOP_WORDS.union(my_additional_stop_words)

(where my_additional_stop_words is any sequence of strings) and use the result as the stop_words argument. This input to CountVectorizer.__init__ is parsed by _check_stop_list, which will pass the new frozenset straight through.

152

answered Sep 22 '22 17:09

jonrsharpe

Related questions
                            
                                Matplotlib savefig image trim
                            
                                Get the inner HTML of a element in lxml
                            
                                pip cannot install anything
                            
                                Python doctests: test for None
                            
                                Matplotlib chart does not display in PyCharm
                            
                                Coloring JSON output in python
                            
                                Django - signals. Simple examples to start
                            
                                Pipfile.lock out of date
                            
                                Is integer division always equal to the floor of regular division?
                            
                                Architecture Flask vs FastAPI
                            
                                Pgadmin is not loading
                            
                                Finding a file in a Python module distribution [duplicate]
                            
                                Using south to refactor a Django model with inheritance
                            
                                How to discover table properties from SQLAlchemy mapped object
                            
                                Is a variable swap guaranteed to be atomic in python?
                            
                                Connect Sphinx autodoc-skip-member to my function
                            
                                Django default settings convention for pluggable app?
                            
                                What's the right approach for calling functions after a flask app is run?
                            
                                how can i define decorator method inside class? [duplicate]
                            
                                Pandas groupby and qcut

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Adding words to scikit-learn's CountVectorizer's stop list

Tags:

python

scikit-learn

stop-words

statsNoob

People also ask

1 Answers

jonrsharpe

Recent Activity

Donate For Us