Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn roc_auc_score() returns accuracy values

I am trying to compute area under the ROC curve using sklearn.metrics.roc_auc_score using the following method:

roc_auc = sklearn.metrics.roc_auc_score(actual, predicted)

where actual is a binary vector with ground truth classification labels and predicted is a binary vector with classification labels that my classifier has predicted.

However, the value of roc_auc that I am getting is EXACTLY similar to accuracy values (proportion of samples whose labels are correctly predicted). This is not a one-off thing. I try my classifier on various values of the parameters and every time I get the same result.

What am I doing wrong here?

like image 684
Muhammad Waqar Avatar asked Mar 11 '14 07:03

Muhammad Waqar


1 Answers

This is because you are passing in the decisions of you classifier instead of the scores it calculated. There was a question on this on SO recently and a related pull request to scikit-learn.

The point of a ROC curve (and the area under it) is that you study the precision-recall tradeoff as the classification threshold is varied. By default in a binary classification task, if your classifier's score is > 0.5, then class1 is predicted, otherwise class0 is predicted. As you change that threshold, you get a curve like this. The higher up the curve is (more area under it), the better that classifier. However, to get this curve you need access to the scores of a classifier, not its decisions. Otherwise whatever the decision threshold is, the decision stay the same, and AUC degenerates to accuracy.

Which classifier are you using?

like image 63
mbatchkarov Avatar answered Sep 23 '22 18:09

mbatchkarov