Find the tf-idf score of specific words in documents using sklearn

Tags:

I have code that runs basic TF-IDF vectorizer on a collection of documents, returning a sparse matrix of D X F where D is the number of documents and F is the number of terms. No problem.

But how do I find the TF-IDF score of a specific term in the document? i.e. is there some sort of dictionary between terms (in their textual representation) and their position in the resulting sparse matrix?

913

asked Jun 22 '15 09:06

WhiteTiger

1 Answers

Yes. See .vocabulary_ on your fitted/transformed TF-IDF vectorizer.

In [1]: from sklearn.datasets import fetch_20newsgroups

In [2]: data = fetch_20newsgroups(categories=['rec.autos'])

In [3]: from sklearn.feature_extraction.text import TfidfVectorizer

In [4]: cv = TfidfVectorizer()

In [5]: X = cv.fit_transform(data.data)

In [6]: cv.vocabulary_

It is a dictionary of the form:

{word : column index in array}

132

answered Sep 30 '22 15:09

Ryan

Related questions
                            
                                Unzip zip files in folders and subfolders
                            
                                Modify pandas dataframe values with numpy array
                            
                                How to check that mongo ObjectID is valid in python?
                            
                                Undo Overwrite of Python Built-In
                            
                                How to quantitatively measure goodness of fit in SciPy?
                            
                                How to iterate over `dict` in class like if just referring to `dict`?
                            
                                python: calculate center of mass
                            
                                Trouble installing pygame using pip install
                            
                                how to change the subject for Django error reporting emails?
                            
                                How to put two decimals in cell with type of percent
                            
                                Python: how to get values from a dictionary from pandas series
                            
                                Why does not GridSearchCV give best score ? - Scikit Learn
                            
                                I am trying to plot a 5*2 plot in python
                            
                                Expected type 'Union[ndarray, Iterable]' warning in Python instruction
                            
                                How to download only the latest file from SFTP server with Paramiko?
                            
                                Python How To Import And Use Module With One Line
                            
                                How to limit cross correlation window width in Numpy?
                            
                                Different number of return values in Python function
                            
                                How to return error messages in JSON with Bottle HTTPError?
                            
                                Plotting multiple time series after a groupby in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find the tf-idf score of specific words in documents using sklearn

Tags:

python

scikit-learn

tf-idf

WhiteTiger

People also ask

1 Answers

Ryan

Recent Activity

Donate For Us