Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A transition from CountVectorizer to TfidfTransformer in sklearn

I am processing a huge amount of text data in sklearn. First I need to vectorize the text context (word counts) and then perform a TfidfTransformer. I have the following code that doesn't seem to take the output from CountVectorizer to the input of TfidfTransformer.

TEXT = [data[i].values()[3] for i in range(len(data))]

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

vectorizer = CountVectorizer(min_df=0.01,max_df = 2.5, lowercase = False, stop_words = 'english')

X = vectorizer(TEXT)
transformer = TfidfTransformer(X)
X = transformer.fit_transform()

As I run this code, I obtain this error:

Traceback (most recent call last):
File "nlpQ2.py", line 27, in <module>
X = vectorizer(TEXT)
TypeError: 'CountVectorizer' object is not callable

I thought I had vectorized the text and now it's in a matrix -- is there a transition step that I have missed? Thank you!!

like image 265
yearntolearn Avatar asked Jul 30 '16 17:07

yearntolearn


2 Answers

This line

X = vectorizer(TEXT)

does not produce the output of the vectorizer (and this is the one raising the exception, it has nothing to do with TfIdf itself), you are supposed to call fit_transform. Furthermore, your next call is also wrong. You have to pass data as an argument to fit_transform, not to constructor.

X = vectorizer.fit_transform(TEXT)
transformer = TfidfTransformer()
X = transformer.fit_transform(X)
like image 134
lejlot Avatar answered Sep 25 '22 23:09

lejlot


You're probably looking for a pipeline, perhaps something like this:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
])

or

pipeline = make_pipeline(CountVectorizer(), TfidfTransformer())

On this pipeline, perform the regular operations (e.g., fit, fit_transform, and so forth).

See this example also.

like image 45
Ami Tavory Avatar answered Sep 22 '22 23:09

Ami Tavory