I'm using GridSearchCV and a pipeline to classify some text documents. A code snippet:
clf = Pipeline([('vect', TfidfVectorizer()), ('clf', SVC())])
parameters = {'vect__ngram_range' : [(1,2)], 'vect__min_df' : [2], 'vect__stop_words' : ['english'],
'vect__lowercase' : [True], 'vect__norm' : ['l2'], 'vect__analyzer' : ['word'], 'vect__binary' : [True],
'clf__kernel' : ['rbf'], 'clf__C' : [100], 'clf__gamma' : [0.01], 'clf__probability' : [True]}
grid_search = GridSearchCV(clf, parameters, n_jobs = -2, refit = True, cv = 10)
grid_search.fit(corpus, labels)
My problem is that when using grid_serach.predict_proba(new_doc)
and then wanting to find out what classes the probabilities corresponds to with grid_search.classes_
, I get the following error:
AttributeError: 'GridSearchCV' object has no attribute 'classes_'
What have I missed? I thought that if the last "step" in the pipeline was a classifier, then the return of GridSearchCV is also a classifier. Hence one can use the attributes of that classifier, e.g. classes_.
As mentioned in the comments above, the grid_search.best_estimator_.classes_
returned an error message since it returns a pipeline with no attribute .classes_
. However, by first calling the step classifier of the pipeline I was able to use the classes attribute. Here is the solution
grid_search.best_estimator_.named_steps['clf'].classes_
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With