Tfidfvectorizer from sklearn - how to get matrix

Question

I would like to get matrix out of Tfidfvectorizer object from sklearn. Here is my code:

from sklearn.feature_extraction.text import TfidfVectorizer
text = ["The quick brown fox jumped over the lazy dog.",
        "The dog.",
        "The fox"]

vectorizer = TfidfVectorizer()
vectorizer.fit_transform(text)

Here is what I tried and got back errors:

vectorizer.toarray()

--------------------------------------------------------------------------- 
AttributeError                            Traceback (most recent call last) <ipython-input-117-76146e626284> in <module>()   
----> 1 vectorizer.toarray()

AttributeError: 'TfidfVectorizer' object has no attribute 'toarray'

another attempt

vectorizer.todense()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-118-6386ee121184> in <module>()
----> 1 vectorizer.todense()

AttributeError: 'TfidfVectorizer' object has no attribute 'todense'

yatu · Accepted Answer

Note that vectorizer.fit_transform returns the term-document matrix that you want to obtain. So save what it returns, and use todense, as it will be in sparse format:

Returns: X : sparse matrix, [n_samples, n_features]. Tf-idf-weighted document-term matrix.

a = vectorizer.fit_transform(text)
a.todense()

matrix([[0.36388646, 0.27674503, 0.27674503, 0.36388646, 0.36388646,
         0.36388646, 0.36388646, 0.42983441],
        [0.        , 0.78980693, 0.        , 0.        , 0.        ,
         0.        , 0.        , 0.61335554],
        [0.        , 0.        , 0.78980693, 0.        , 0.        ,
         0.        , 0.        , 0.61335554]])

YOLO · Answer

.fit_transform itself returns a document term matrix. So, you do:

matrix = vectorizer.fit_transform(text)

matrix.todense() use to convert the sparse to dense matrix.
matrix.shape will give you the shape of matrix.

Tfidfvectorizer from sklearn - how to get matrix

Tags:

python

scikit-learn

tf-idf

tfidfvectorizer

user1700890

2 Answers

yatu

YOLO

Recent Activity

Donate For Us

Tfidfvectorizer from sklearn - how to get matrix

Tags:

python

scikit-learn

tf-idf

tfidfvectorizer

user1700890

2 Answers

yatu

YOLO

Related questions

Recent Activity

Donate For Us