Getting the accuracy for multi-label prediction in scikit-learn

Tags:

In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1):

enter image description here

Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely

enter image description here

(as defined in (1) and (2), and less ambiguously referred to as the Hamming score (4) (since it is closely related to the Hamming loss), or label-based accuracy) ?

(1) Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis (2010).

(2) Tsoumakas, Grigorios, and Ioannis Katakis. "Multi-label classification: An overview." Dept. of Informatics, Aristotle University of Thessaloniki, Greece (2006).

(3) Ghamrawi, Nadia, and Andrew McCallum. "Collective multi-label classification." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.

(4) Godbole, Shantanu, and Sunita Sarawagi. "Discriminative methods for multi-labeled classification." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 22-30.

298

asked Aug 27 '15 02:08

Franck Dernoncourt

2 Answers

You can write one version yourself, here is a example without considering the weight and normalize.

import numpy as np  y_true = np.array([[0,1,0],                    [0,1,1],                    [1,0,1],                    [0,0,1]])  y_pred = np.array([[0,1,1],                    [0,1,1],                    [0,1,0],                    [0,0,0]])  def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):     '''     Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case     http://stackoverflow.com/q/32239577/395857     '''     acc_list = []     for i in range(y_true.shape[0]):         set_true = set( np.where(y_true[i])[0] )         set_pred = set( np.where(y_pred[i])[0] )         #print('\nset_true: {0}'.format(set_true))         #print('set_pred: {0}'.format(set_pred))         tmp_a = None         if len(set_true) == 0 and len(set_pred) == 0:             tmp_a = 1         else:             tmp_a = len(set_true.intersection(set_pred))/\                     float( len(set_true.union(set_pred)) )         #print('tmp_a: {0}'.format(tmp_a))         acc_list.append(tmp_a)     return np.mean(acc_list)  if __name__ == "__main__":     print('Hamming score: {0}'.format(hamming_score(y_true, y_pred))) # 0.375 (= (0.5+1+0+0)/4)      # For comparison sake:     import sklearn.metrics      # Subset accuracy     # 0.25 (= 0+1+0+0 / 4) --> 1 if the prediction for one sample fully matches the gold. 0 otherwise.     print('Subset accuracy: {0}'.format(sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)))      # Hamming loss (smaller is better)     # $$ \text{HammingLoss}(x_i, y_i) = \frac{1}{|D|} \sum_{i=1}^{|D|} \frac{xor(x_i, y_i)}{|L|}, $$     # where     #  - \\(|D|\\) is the number of samples       #  - \\(|L|\\) is the number of labels       #  - \\(y_i\\) is the ground truth       #  - \\(x_i\\)  is the prediction.       # 0.416666666667 (= (1+0+3+1) / (3*4) )     print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true, y_pred)))

Outputs:

Hamming score: 0.375 Subset accuracy: 0.25 Hamming loss: 0.416666666667

189

answered Oct 26 '22 16:10

William

A simple summary function:

import numpy as np  def hamming_score(y_true, y_pred):     return (         (y_true & y_pred).sum(axis=1) / (y_true | y_pred).sum(axis=1)     ).mean()   hamming_score(y_true, y_pred) # 0.375

answered Oct 26 '22 16:10

nocibambi

Related questions
                            
                                How to get all process ids without ps command on Linux
                            
                                What does $$ and <generated> mean in Java stacktrace?
                            
                                Android Google SignIn not working in debug mode: GoogleSignInResult is false
                            
                                Generator is not an iterator?
                            
                                How can I use decorators today?
                            
                                What is node_modules directory in AngularJS?
                            
                                Why does Go panic on writing to a closed channel?
                            
                                How to concatenate static strings in Rust
                            
                                Cannot convert a partially converted tensor in TensorFlow
                            
                                Django: Where to store global static files and templates?
                            
                                How do you configure a Sentry raven client in a development environment to not send exceptions and still work?
                            
                                clang-format: Align asterisk (*) of pointer declaration with variable name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With