Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Display the incorrectly classified instances

I am using Scikit-learn in building a SVM classifier ... and when running the classifier .. I want to improve the accuracy of my classifier by checking the incorrectly classified instances and trying to figure out the reason behind the misclassification ... so is there a way to display the incorrectly classified instances ?

like image 536
Ophilia Avatar asked Jan 06 '23 18:01

Ophilia


2 Answers

Is there a way to display the incorrectly classified instances?

Yes, you need to do a bit of indexing here and there. Below is an example but the technical details would depend on how is the input and output of your classifier.

The simple case is when the output is a single value, so you can easily compare whether an instance has been correctly classified or not. For example, let's gather some data and train a binary classifier:

>>> from sklearn import cross_validation, datasets, svm
>>> X, y = datasets.make_classification()
>>> X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y)
>>> clf = svm.LinearSVC().fit(X_train, y_train)
>>> y_pred = clf.predict(X_test)

You can compare y_test and y_pred directly as the output is a single value. In case you are training a multiclass model then you won't be able to do a straightforward comparison but instead you should compare class by class.

>>> misclassified_samples = X_test[y_test != y_pred]

If needed, you can convert the boolean mask to indices too.

>>> import numpy as np
>>> np.flatnonzero(y_test != y_pred)
array([ 0, 20, 22])
like image 138
R. Max Avatar answered Jan 10 '23 07:01

R. Max


Ill assume you using Linear SVM. If not, it is very similar procedure.

from sklearn.svm import LinearSVC
X_train=your_train_data
y_train=your_train_lables
X_test=your_test_data #should be around 30% of you your data
y_test=your_test_labels
svm = LinearSVC()
svm.fit(X_train, y_train)
for item, label in zip(X_test, y_test):
    result = svm.predict([item])
    if result != label:
        print "predicted label %s, but true label is %s" % (result, label)

This will print you every error your classifier did on test data.

like image 41
Farseer Avatar answered Jan 10 '23 07:01

Farseer