Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret almost perfect accuracy and AUC-ROC but zero f1-score, precision and recall

Tags:

I am training ML logistic classifier to classify two classes using python scikit-learn. They are in an extremely imbalanced data (about 14300:1). I'm getting almost 100% accuracy and ROC-AUC, but 0% in precision, recall, and f1 score. I understand that accuracy is usually not useful in very imbalanced data, but why is the ROC-AUC measure is close to perfect as well?

from sklearn.metrics import roc_curve, auc  # Get ROC  y_score = classifierUsed2.decision_function(X_test) false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_score) roc_auc = auc(false_positive_rate, true_positive_rate) print 'AUC-'+'=',roc_auc  1= class1 0= class2 Class count: 0    199979 1        21  Accuracy: 0.99992 Classification report:              precision    recall  f1-score   support            0       1.00      1.00      1.00     99993           1       0.00      0.00      0.00         7  avg / total       1.00      1.00      1.00    100000  Confusion matrix: [[99992     1]  [    7     0]] AUC= 0.977116255281 

The above is using logistic regression, below is using decision tree, the decision matrix looks almost identical, but the AUC is a lot different.

1= class1 0= class2 Class count: 0    199979 1        21 Accuracy: 0.99987 Classification report:              precision    recall  f1-score   support            0       1.00      1.00      1.00     99989           1       0.00      0.00      0.00        11  avg / total       1.00      1.00      1.00    100000  Confusion matrix: [[99987     2]  [   11     0]] AUC= 0.4999899989 
like image 573
KubiK888 Avatar asked Jan 09 '16 19:01

KubiK888


People also ask

What does it mean if F1 score is 0?

A binary classification task. Clearly, the higher the F1 score the better, with 0 being the worst possible and 1 being the best.

How can you interpret AUC ROC curve What is the significance of it?

AREA UNDER THE ROC CURVE In general, an AUC of 0.5 suggests no discrimination (i.e., ability to diagnose patients with and without the disease or condition based on the test), 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding.

What is a good AUC for a precision-recall curve?

Usually, the AUC is in the range [0.5,1] because useful classifiers should perform better than random. In principle, however, the AUC can also be smaller than 0.5, which indicates that a classifier performs worse than a random classifier.

Is F1 score better than precision and recall?

F1-score equals precision and recall if the two input metrics (P&R) are equal. The Difference column in the table shows the difference between the smaller value (Precision/Recall) and F1-score. Here they are equal, so no difference, in following examples they start to vary.


1 Answers

One must understand crucial difference between AUC ROC and "point-wise" metrics like accuracy/precision etc. ROC is a function of a threshold. Given a model (classifier) that outputs the probability of belonging to each class, we predict the class that has the highest probability (support). However, sometimes we can get better scores by changing this rule and requiring one support to be 2 times bigger than the other to actually classify as a given class. This is often true for imbalanced datasets. This way you are actually modifying the learned prior of classes to better fit your data. ROC looks at "what would happen if I change this threshold to all possible values" and then AUC ROC computes the integral of such a curve.

Consequently:

  • high AUC ROC vs low f1 or other "point" metric, means that your classifier currently does a bad job, however you can find the threshold for which its score is actually pretty decent
  • low AUC ROC and low f1 or other "point" metric, means that your classifier currently does a bad job, and even fitting a threshold will not change it
  • high AUC ROC and high f1 or other "point" metric, means that your classifier currently does a decent job, and for many other values of threshold it would do the same
  • low AUC ROC vs high f1 or other "point" metric, means that your classifier currently does a decent job, however for many other values of threshold - it is pretty bad
like image 70
lejlot Avatar answered Oct 27 '22 09:10

lejlot