tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

Tags:

this page: http://scikit-learn.org/stable/modules/feature_extraction.html mentions:

As tf–idf is a very often used for text features, there is also another class called TfidfVectorizer that combines all the option of CountVectorizer and TfidfTransformer in a single model.

then I followed the code and use fit_transform() on my corpus. How to get the weight of each feature computed by fit_transform()?

I tried:

In [39]: vectorizer.idf_ --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-39-5475eefe04c0> in <module>() ----> 1 vectorizer.idf_  AttributeError: 'TfidfVectorizer' object has no attribute 'idf_'

but this attribute is missing.

Thanks

752

asked May 21 '14 20:05

fast tooth

1 Answers

Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute idf_ of the TfidfVectorizer object:

from sklearn.feature_extraction.text import TfidfVectorizer corpus = ["This is very strange",           "This is very nice"] vectorizer = TfidfVectorizer(min_df=1) X = vectorizer.fit_transform(corpus) idf = vectorizer.idf_ print dict(zip(vectorizer.get_feature_names(), idf))

Output:

{u'is': 1.0,  u'nice': 1.4054651081081644,  u'strange': 1.4054651081081644,  u'this': 1.0,  u'very': 1.0}

As discussed in the comments, prior to version 0.15, a workaround is to access the attribute idf_ via the supposedly hidden _tfidf (an instance of TfidfTransformer) of the vectorizer:

idf = vectorizer._tfidf.idf_ print dict(zip(vectorizer.get_feature_names(), idf))

which should give the same output as above.

183

answered Sep 20 '22 14:09

YS-L

Related questions
                            
                                Unzip all zipped files in a folder to that same folder using Python 2.7.5
                            
                                Python Pandas: Calculate moving average within group
                            
                                How do you specify a default for a Django ForeignKey Model or AdminModel field?
                            
                                In python, how to check if a date is valid?
                            
                                How does Python's comma operator work during assignment?
                            
                                Python: required kwarg, which exception to raise?
                            
                                out of memory issue in installing packages on Ubuntu server
                            
                                How to run different python versions in cmd [duplicate]
                            
                                Django: Difference between using server through manage.py and other servers like gunicorn etc. Which is better?
                            
                                How to turn off dropout for testing in Tensorflow?
                            
                                Keras: change learning rate
                            
                                Can ElementTree be told to preserve the order of attributes?
                            
                                Unicode Encode Error when writing pandas df to csv
                            
                                Python pandas slice dataframe by multiple index ranges
                            
                                Tensorflow Slim: TypeError: Expected int32, got list containing Tensors of type '_Message' instead
                            
                                Conda set LD_LIBRARY_PATH for env only [duplicate]
                            
                                True dynamic and anonymous functions possible in Python?
                            
                                libpython2.7.so.1.0: cannot open shared object file: No such file or directory
                            
                                Upgraded to Ubuntu 16.04 now MySQL-python dependencies are broken
                            
                                Setting delete-orphan on SQLAlchemy relationship causes AssertionError: This AttributeImpl is not configured to track parents

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

Tags:

python

scikit-learn

tf-idf

fast tooth

People also ask

1 Answers

YS-L

Recent Activity

Donate For Us