Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the accuracy for multi-label prediction in scikit-learn

Tags:

In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1):

enter image description here

Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely

enter image description here

(as defined in (1) and (2), and less ambiguously referred to as the Hamming score (4) (since it is closely related to the Hamming loss), or label-based accuracy) ?


(1) Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis (2010).

(2) Tsoumakas, Grigorios, and Ioannis Katakis. "Multi-label classification: An overview." Dept. of Informatics, Aristotle University of Thessaloniki, Greece (2006).

(3) Ghamrawi, Nadia, and Andrew McCallum. "Collective multi-label classification." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.

(4) Godbole, Shantanu, and Sunita Sarawagi. "Discriminative methods for multi-labeled classification." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 22-30.

like image 298
Franck Dernoncourt Avatar asked Aug 27 '15 02:08

Franck Dernoncourt


People also ask

How do you calculate accuracy for multi-label classification?

We can sum up the values across classes to obtain global FP, FN, TP, and TN counts for the classifier as a whole. This would allow us to compute a global accuracy score using the formula for accuracy. Accuracy = (4 + 3) / (4 + 3 + 2 + 3) = 7 / 12 = 0.583 = 58%.

How do you calculate accuracy in Sklearn?

Here we can use scikit learn accuracy_score for calculating the accuracy of data. y_pred = [0, 5, 2, 4] is used as predicted value that we can choose. y_true = [0, 1, 2, 3] is used as true value that already given. accuracy_score(y_true, y_pred) is used to check the accuracy_score of true value and predicted value.

Which library is used to check the accuracy of a predictive model in Sklearn?

Accuracy using Sklearn's accuracy_score() You can also get the accuracy score in python using sklearn. metrics' accuracy_score() function which takes in the true labels and the predicted labels as arguments and returns the accuracy as a float value.

What is accuracy score in Scikit learn?

Accuracy score is the portion of samples that were correctly classified, out of the total number of samples, so it ranges from 0 to 1.


2 Answers

You can write one version yourself, here is a example without considering the weight and normalize.

import numpy as np  y_true = np.array([[0,1,0],                    [0,1,1],                    [1,0,1],                    [0,0,1]])  y_pred = np.array([[0,1,1],                    [0,1,1],                    [0,1,0],                    [0,0,0]])  def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):     '''     Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case     http://stackoverflow.com/q/32239577/395857     '''     acc_list = []     for i in range(y_true.shape[0]):         set_true = set( np.where(y_true[i])[0] )         set_pred = set( np.where(y_pred[i])[0] )         #print('\nset_true: {0}'.format(set_true))         #print('set_pred: {0}'.format(set_pred))         tmp_a = None         if len(set_true) == 0 and len(set_pred) == 0:             tmp_a = 1         else:             tmp_a = len(set_true.intersection(set_pred))/\                     float( len(set_true.union(set_pred)) )         #print('tmp_a: {0}'.format(tmp_a))         acc_list.append(tmp_a)     return np.mean(acc_list)  if __name__ == "__main__":     print('Hamming score: {0}'.format(hamming_score(y_true, y_pred))) # 0.375 (= (0.5+1+0+0)/4)      # For comparison sake:     import sklearn.metrics      # Subset accuracy     # 0.25 (= 0+1+0+0 / 4) --> 1 if the prediction for one sample fully matches the gold. 0 otherwise.     print('Subset accuracy: {0}'.format(sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)))      # Hamming loss (smaller is better)     # $$ \text{HammingLoss}(x_i, y_i) = \frac{1}{|D|} \sum_{i=1}^{|D|} \frac{xor(x_i, y_i)}{|L|}, $$     # where     #  - \\(|D|\\) is the number of samples       #  - \\(|L|\\) is the number of labels       #  - \\(y_i\\) is the ground truth       #  - \\(x_i\\)  is the prediction.       # 0.416666666667 (= (1+0+3+1) / (3*4) )     print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true, y_pred)))  

Outputs:

Hamming score: 0.375 Subset accuracy: 0.25 Hamming loss: 0.416666666667 
like image 189
William Avatar answered Oct 26 '22 16:10

William


A simple summary function:

import numpy as np  def hamming_score(y_true, y_pred):     return (         (y_true & y_pred).sum(axis=1) / (y_true | y_pred).sum(axis=1)     ).mean()   hamming_score(y_true, y_pred) # 0.375 
like image 41
nocibambi Avatar answered Oct 26 '22 16:10

nocibambi