In a multilabel classification setting, sklearn.metrics.accuracy_score
only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.
This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1):
Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely
(as defined in (1) and (2), and less ambiguously referred to as the Hamming score (4) (since it is closely related to the Hamming loss), or label-based accuracy) ?
(1) Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis (2010).
(2) Tsoumakas, Grigorios, and Ioannis Katakis. "Multi-label classification: An overview." Dept. of Informatics, Aristotle University of Thessaloniki, Greece (2006).
(3) Ghamrawi, Nadia, and Andrew McCallum. "Collective multi-label classification." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.
(4) Godbole, Shantanu, and Sunita Sarawagi. "Discriminative methods for multi-labeled classification." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 22-30.
We can sum up the values across classes to obtain global FP, FN, TP, and TN counts for the classifier as a whole. This would allow us to compute a global accuracy score using the formula for accuracy. Accuracy = (4 + 3) / (4 + 3 + 2 + 3) = 7 / 12 = 0.583 = 58%.
Here we can use scikit learn accuracy_score for calculating the accuracy of data. y_pred = [0, 5, 2, 4] is used as predicted value that we can choose. y_true = [0, 1, 2, 3] is used as true value that already given. accuracy_score(y_true, y_pred) is used to check the accuracy_score of true value and predicted value.
Accuracy using Sklearn's accuracy_score() You can also get the accuracy score in python using sklearn. metrics' accuracy_score() function which takes in the true labels and the predicted labels as arguments and returns the accuracy as a float value.
Accuracy score is the portion of samples that were correctly classified, out of the total number of samples, so it ranges from 0 to 1.
You can write one version yourself, here is a example without considering the weight and normalize.
import numpy as np y_true = np.array([[0,1,0], [0,1,1], [1,0,1], [0,0,1]]) y_pred = np.array([[0,1,1], [0,1,1], [0,1,0], [0,0,0]]) def hamming_score(y_true, y_pred, normalize=True, sample_weight=None): ''' Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case http://stackoverflow.com/q/32239577/395857 ''' acc_list = [] for i in range(y_true.shape[0]): set_true = set( np.where(y_true[i])[0] ) set_pred = set( np.where(y_pred[i])[0] ) #print('\nset_true: {0}'.format(set_true)) #print('set_pred: {0}'.format(set_pred)) tmp_a = None if len(set_true) == 0 and len(set_pred) == 0: tmp_a = 1 else: tmp_a = len(set_true.intersection(set_pred))/\ float( len(set_true.union(set_pred)) ) #print('tmp_a: {0}'.format(tmp_a)) acc_list.append(tmp_a) return np.mean(acc_list) if __name__ == "__main__": print('Hamming score: {0}'.format(hamming_score(y_true, y_pred))) # 0.375 (= (0.5+1+0+0)/4) # For comparison sake: import sklearn.metrics # Subset accuracy # 0.25 (= 0+1+0+0 / 4) --> 1 if the prediction for one sample fully matches the gold. 0 otherwise. print('Subset accuracy: {0}'.format(sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None))) # Hamming loss (smaller is better) # $$ \text{HammingLoss}(x_i, y_i) = \frac{1}{|D|} \sum_{i=1}^{|D|} \frac{xor(x_i, y_i)}{|L|}, $$ # where # - \\(|D|\\) is the number of samples # - \\(|L|\\) is the number of labels # - \\(y_i\\) is the ground truth # - \\(x_i\\) is the prediction. # 0.416666666667 (= (1+0+3+1) / (3*4) ) print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true, y_pred)))
Outputs:
Hamming score: 0.375 Subset accuracy: 0.25 Hamming loss: 0.416666666667
A simple summary function:
import numpy as np def hamming_score(y_true, y_pred): return ( (y_true & y_pred).sum(axis=1) / (y_true | y_pred).sum(axis=1) ).mean() hamming_score(y_true, y_pred) # 0.375
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With