I'm interested to know if there's built in functions in scikitlearn python module, that can retrieve misclassified documents.
it's simple i usually write it myself by comparing both predicted and Test vectors and retrieve the documents from the test document array. but i'm asking if there's a built in functionality for it instead of copying the functionality in each python code i write.
If you have a list of true labels y_test
for a set of documents, e.g. ["ham", "spam", "spam", "ham"]
and you convert that to a NumPy array, then you can compare it with the predictions in a one-liner:
import numpy as np
y_test = np.asarray(y_test)
misclassified = np.where(y_test != clf.predict(X_test))
Now misclassified
is an array of indices into X_test
.
@eickenberg is right, this kind of stuff is not implemented in scikit-learn because users are expected to be familiar enough with NumPy to do it themselves in a few lines of code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With