Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reason of having high AUC and low accuracy in a balanced dataset

Given a balanced dataset (size of both classes are the same), fitting it into an SVM model I yield a high AUC value (~0.9) but a low accuracy (~0.5).

I have totally no idea why would this happen, can anyone explain this case for me?

like image 648
Jamin Avatar asked Jul 15 '16 04:07

Jamin


People also ask

Why is my AUC so high while my accuracy is low?

So how can it be that the AUC is large while the accuracy is low at the same time? Well this may happen if your classifier achieves the good performance on the positive class (high AUC) at the cost of a high false negatives rate (or a low number of true negative).

Why is the AUC for a better than B?

Why is the AUC for A better than B, when B "seems" to outperform A with respect to accuracy? Accuracy is computed at the threshold value of 0.5. While AUC is computed by adding all the "accuracies" computed for all the possible threshold values.

When is balanced accuracy not a good measure of model accuracy?

When there’s a high skew or some classes are more important than others, then balanced accuracy isn’t a perfect judge for the model. Researching and building machine learning models can be fun, but it can also be very frustrating if the right metrics aren’t used.

Why is my test data less accurate than my training data?

It is just usual that accuracy via test data (new unseen data for testing performance or validity of proposed model, also called cross validation) may be less than or equal to the accuracy over training data. Accuracy depends on the actual train/test datasets, which can be biased, so cross-validation is a better approximation.


1 Answers

I recently stumbled upon the same question. Here is what I figured out for myself - let me know if I'm wrong.

Before we ponder why the area under the ROC curve (AUC) can be high while accuracy is low, let's first recapitulate the meanings of these terms.

The receiver-operator characteristic (ROC) curve plots the false positive rate FPR(t) against the true positive rate TPR(t), for varying decision thresholds (or prediction cutoffs) t.

TPR and FPR are defined as follows:

TPR = TP / P = TP / (TP+FN) = number of true positives / number of positives
FPR = FP / N = FP / (FP+TN) = number of false positives / number of negatives

In the ROC analysis, it is assumed that the classifier can be reduced to the following functional behavior:

def classifier(observation, t):
    if score_function(observation) <= t: 
        observation belongs to the "negative" class A
    else:           
        observation belongs to the "positive" class B

Think of the decision threshold t as a free parameter that is adjusted when training a classifier. (Not all classifiers have a straightforward parametrization, but for know stick with logistic regression or simple thresholding, for which there is an obvious choice for such a parameter t.) During the training process, the optimal threshold t* is chosen such that some cost function is minimized.

Given the training/test data, note that any choice of parameter t determines which of the data points are true positives (TP), false positives (FP), true negatives (TN) or false negatives (FN). Hence, any choice of t determines also the FPR(t) and TPR(t).

So we've seen the following: A ROC curve represents a curve parametrized by the decision threshold t, where x = FPR(t) and y = TPR(t) for all possible values for t.

The area under the resulting ROC curve is called AUC. It measures for your training/test data, how well the classifier can discriminate between samples from the "positive" and the "negative" class. A perfect classifier's ROC curve would pass through the optimal point FPR(t*) = 0 and TPR(t*) = 1 and would yield an AUC of 1. A random classifier's ROC, however, follows the diagonal FPR(t)=TPR(t), yielding an AUC of 0.5.

Finally, accuracy is defined as the ratio of all correctly labeled cases and the total number of cases:

accuracy = (TP+TN)/(Total number of cases) = (TP+TN)/(TP+FP+TN+FN)

So how can it be that the AUC is large while the accuracy is low at the same time? Well this may happen if your classifier achieves the good performance on the positive class (high AUC) at the cost of a high false negatives rate (or a low number of true negative).

The question why the training process led to a classifier with such a poor prediction performance is a different one and is specific to your problem/data and the classification methods you used.

In summary, the ROC analysis tells you something about how well the samples of the positive class can be separated from the other class, while the prediction accuracy hints on the actual performance of your classifier.

like image 164
normanius Avatar answered Dec 30 '22 22:12

normanius