Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

retrieve misclassified documents using scikitlearn

I'm interested to know if there's built in functions in scikitlearn python module, that can retrieve misclassified documents.

it's simple i usually write it myself by comparing both predicted and Test vectors and retrieve the documents from the test document array. but i'm asking if there's a built in functionality for it instead of copying the functionality in each python code i write.

like image 337
Hady Elsahar Avatar asked Aug 28 '14 14:08

Hady Elsahar


1 Answers

If you have a list of true labels y_test for a set of documents, e.g. ["ham", "spam", "spam", "ham"] and you convert that to a NumPy array, then you can compare it with the predictions in a one-liner:

import numpy as np

y_test = np.asarray(y_test)
misclassified = np.where(y_test != clf.predict(X_test))

Now misclassified is an array of indices into X_test.

@eickenberg is right, this kind of stuff is not implemented in scikit-learn because users are expected to be familiar enough with NumPy to do it themselves in a few lines of code.

like image 73
Fred Foo Avatar answered Nov 01 '22 13:11

Fred Foo