Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python scikit learn multi-class multi-label performance metrics?

I ran Random Forest classifier for my multi-class multi-label output variable. I got below output.

My y_test values


     Degree  Nature
762721       1       7                              
548912       0       6
727126       1      12
14880        1      12
189505       1      12
657486       1      12
461004       1       0
31548        0       6
296674       1       7
121330       0      17


predicted output :

[[  1.   7.]
 [  0.   6.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.   0.]
 [  0.   6.]
 [  1.   7.]
 [  0.  17.]]

Now I want to check the performance of my classifier. I found that for multiclass multilabel "Hamming loss or jaccard_similarity_score" is the good metrics. I tried to calculate it but I was getting value error.

Error:
ValueError: multiclass-multioutput is not supported

Below line I tried:

print hamming_loss(y_test, RF_predicted)
print jaccard_similarity_score(y_test, RF_predicted)

Thanks,

like image 465
niranjan Avatar asked Aug 01 '16 11:08

niranjan


People also ask

How do you calculate accuracy in multi label classification?

Accuracy is simply the number of correct predictions divided by the total number of examples. If we consider that a prediction is correct if and only if the predicted binary vector is equal to the ground-truth binary vector, then our model would have an accuracy of 1 / 4 = 0.25 = 25%.

How do you find the accuracy of a multi-class classification in Python?

To calculate accuracy, use the following formula: (TP+TN)/(TP+TN+FP+FN). Misclassification Rate: It tells you what fraction of predictions were incorrect. It is also known as Classification Error. You can calculate it using (FP+FN)/(TP+TN+FP+FN) or (1-Accuracy).

How do you evaluate Multilabel?

For a multilabel classification, we compute the number of False Positives and False Negative per instance and then average it over the total number of training instances.

What is sklearn metrics accuracy_score?

In Python, the accuracy_score function of the sklearn. metrics package calculates the accuracy score for a set of predicted labels against the true labels.


1 Answers

To calculate the unsupported hamming loss for multiclass / multilabel, you could:

import numpy as np
y_true = np.array([[1, 1], [2, 3]])
y_pred = np.array([[0, 1], [1, 2]])
np.sum(np.not_equal(y_true, y_pred))/float(y_true.size)

0.75

You can also get the confusion_matrix for each of the two labels like so:

from sklearn.metrics import confusion_matrix, precision_score
np.random.seed(42)

y_true = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T

[[0 4]
 [1 4]
 [0 4]
 [0 4]
 [0 2]
 [1 4]
 [0 3]
 [0 2]
 [0 3]
 [1 3]]

y_pred = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T

[[1 2]
 [1 2]
 [1 4]
 [1 4]
 [0 4]
 [0 3]
 [1 4]
 [1 3]
 [1 3]
 [0 4]]

confusion_matrix(y_true[:, 0], y_pred[:, 0])

[[1 6]
 [2 1]]

confusion_matrix(y_true[:, 1], y_pred[:, 1])

[[0 1 1]
 [0 1 2]
 [2 1 2]]

You could also calculate the precision_score like so (or the recall_score in a similiar way):

precision_score(y_true[:, 0], y_pred[:, 0])

0.142857142857
like image 116
Stefan Avatar answered Sep 18 '22 21:09

Stefan