I am doing a k-fold XV on an existing dataframe, and I need to get the AUC score. The problem is - sometimes the test data only contains 0s, and not 1s!
I tried using this example, but with different numbers:
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
roc_auc_score(y_true, y_scores)
And I get this exception:
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
Is there any workaround that can make it work in such cases?
The roc_auc_score always runs from 0 to 1, and is sorting predictive possibilities. 0.5 is the baseline for random guessing, so you want to always get above 0.5.
The AUC for the ROC can be calculated using the roc_auc_score() function. Like the roc_curve() function, the AUC function takes both the true outcomes (0,1) from the test set and the predicted probabilities for the 1 class. It returns the AUC score between 0.0 and 1.0 for no skill and perfect skill respectively.
The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.
The ROC is also known as a relative operating characteristic curve, as it is a comparison of two operating characteristics, the True Positive Rate and the False Positive Rate, as the criterion changes. An ideal classifier will have a ROC where the graph would hit a true positive rate of 100% with zero false positives.
You could use try-except to prevent the error:
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
try:
roc_auc_score(y_true, y_scores)
except ValueError:
pass
Now you can also set the roc_auc_score
to be zero if there is only one class present. However, I wouldn't do this. I guess your test data is highly unbalanced. I would suggest to use stratified K-fold instead so that you at least have both classes present.
As the error notes, if a class is not present in the ground truth of a batch,
ROC AUC score is not defined in that case.
I'm against either throwing an exception (about what? This is the expected behaviour) or returning another metric (e.g. accuracy). The metric is not broken per se.
I don't feel like solving a data imbalance "issue" with a metric "fix". It would probably be better to use another sampling, if possibile, or just join multiple batches that satisfy the class population requirement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With