Reason of having high AUC and low accuracy in a balanced dataset

1 Answers

I recently stumbled upon the same question. Here is what I figured out for myself - let me know if I'm wrong.

Before we ponder why the area under the ROC curve (AUC) can be high while accuracy is low, let's first recapitulate the meanings of these terms.

The receiver-operator characteristic (ROC) curve plots the false positive rate FPR(t) against the true positive rate TPR(t), for varying decision thresholds (or prediction cutoffs) t.

TPR and FPR are defined as follows:

TPR = TP / P = TP / (TP+FN) = number of true positives / number of positives
FPR = FP / N = FP / (FP+TN) = number of false positives / number of negatives

In the ROC analysis, it is assumed that the classifier can be reduced to the following functional behavior:

def classifier(observation, t):
    if score_function(observation) <= t: 
        observation belongs to the "negative" class A
    else:           
        observation belongs to the "positive" class B

Think of the decision threshold t as a free parameter that is adjusted when training a classifier. (Not all classifiers have a straightforward parametrization, but for know stick with logistic regression or simple thresholding, for which there is an obvious choice for such a parameter t.) During the training process, the optimal threshold t* is chosen such that some cost function is minimized.

Given the training/test data, note that any choice of parameter t determines which of the data points are true positives (TP), false positives (FP), true negatives (TN) or false negatives (FN). Hence, any choice of t determines also the FPR(t) and TPR(t).

So we've seen the following: A ROC curve represents a curve parametrized by the decision threshold t, where x = FPR(t) and y = TPR(t) for all possible values for t.

The area under the resulting ROC curve is called AUC. It measures for your training/test data, how well the classifier can discriminate between samples from the "positive" and the "negative" class. A perfect classifier's ROC curve would pass through the optimal point FPR(t*) = 0 and TPR(t*) = 1 and would yield an AUC of 1. A random classifier's ROC, however, follows the diagonal FPR(t)=TPR(t), yielding an AUC of 0.5.

Finally, accuracy is defined as the ratio of all correctly labeled cases and the total number of cases:

accuracy = (TP+TN)/(Total number of cases) = (TP+TN)/(TP+FP+TN+FN)

So how can it be that the AUC is large while the accuracy is low at the same time? Well this may happen if your classifier achieves the good performance on the positive class (high AUC) at the cost of a high false negatives rate (or a low number of true negative).

The question why the training process led to a classifier with such a poor prediction performance is a different one and is specific to your problem/data and the classification methods you used.

In summary, the ROC analysis tells you something about how well the samples of the positive class can be separated from the other class, while the prediction accuracy hints on the actual performance of your classifier.

164

answered Dec 30 '22 22:12

normanius

Related questions
                            
                                How to install tensorflow GPU version on VirtualBox Ubuntu OS. And host OS is windows 10
                            
                                Imbalanced classes in multi-class classification problem
                            
                                What machine learning benchmarks are out there?
                            
                                Ordered Logit in Python?
                            
                                Making a meaningful sentence from a given set of words [closed]
                            
                                Weighted linear regression with Scikit-learn
                            
                                What is stratified bootstrap?
                            
                                String Distance Matrix in Python
                            
                                What is the purpose of weights and biases in tensorflow word2vec example?
                            
                                Loss on masked tensors
                            
                                Why is accuracy from fit_generator different to that from evaluate_generator in Keras?
                            
                                Visualizing a decision tree ( example from scikit-learn )
                            
                                Retrieving the optimal number of clusters in R
                            
                                Uniformly shuffle 5 gigabytes of numpy data
                            
                                Neural network backprop not fully training
                            
                                PyTorch : predict single example
                            
                                Trying to write my own Neural Network in Python
                            
                                In scikit-learn, can DBSCAN use sparse matrix?
                            
                                Fast Information Gain computation
                            
                                TensorFlow: Adding Class to Pre-trained Inception Model & Outputting Full Image Hierarchy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reason of having high AUC and low accuracy in a balanced dataset

Tags:

machine-learning

auc

Jamin

People also ask

1 Answers

normanius

Recent Activity

Donate For Us