A transition from CountVectorizer to TfidfTransformer in sklearn

Question

I am processing a huge amount of text data in sklearn. First I need to vectorize the text context (word counts) and then perform a TfidfTransformer. I have the following code that doesn't seem to take the output from CountVectorizer to the input of TfidfTransformer.

TEXT = [data[i].values()[3] for i in range(len(data))]

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

vectorizer = CountVectorizer(min_df=0.01,max_df = 2.5, lowercase = False, stop_words = 'english')

X = vectorizer(TEXT)
transformer = TfidfTransformer(X)
X = transformer.fit_transform()

As I run this code, I obtain this error:

Traceback (most recent call last):
File "nlpQ2.py", line 27, in <module>
X = vectorizer(TEXT)
TypeError: 'CountVectorizer' object is not callable

I thought I had vectorized the text and now it's in a matrix -- is there a transition step that I have missed? Thank you!!

lejlot · Accepted Answer

This line

X = vectorizer(TEXT)

does not produce the output of the vectorizer (and this is the one raising the exception, it has nothing to do with TfIdf itself), you are supposed to call fit_transform. Furthermore, your next call is also wrong. You have to pass data as an argument to fit_transform, not to constructor.

X = vectorizer.fit_transform(TEXT)
transformer = TfidfTransformer()
X = transformer.fit_transform(X)

Ami Tavory · Answer

You're probably looking for a pipeline, perhaps something like this:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
])

or

pipeline = make_pipeline(CountVectorizer(), TfidfTransformer())

On this pipeline, perform the regular operations (e.g., fit, fit_transform, and so forth).

See this example also.

A transition from CountVectorizer to TfidfTransformer in sklearn

Tags:

python

vectorization

scikit-learn

tf-idf

yearntolearn

2 Answers

lejlot

Ami Tavory

Recent Activity

Donate For Us

A transition from CountVectorizer to TfidfTransformer in sklearn

Tags:

python

vectorization

scikit-learn

tf-idf

yearntolearn

2 Answers

lejlot

Ami Tavory

Related questions

Recent Activity

Donate For Us