I have some machine learning results that I don't quite understand. I am using python sciki-learn, with 2+ million data of about 14 features. The classification of 'ab' looks pretty bad on the precision-recall curve, but the ROC for Ab looks just as good as most other groups' classification. What can explain that?
Generally, the use of ROC curves and precision-recall curves are as follows: ROC curves should be used when there are roughly equal numbers of observations for each class. Precision-Recall curves should be used when there is a moderate to large class imbalance.
The main difference between the ROC and PR curves is that the former considers the false positive rate whereas the latter is based on the precision. That is why we first have a closer look at these two concepts for imbalanced data.
The precision-recall curve shows the tradeoff between precision and recall for different threshold. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate.
Precision-Recall (PR) Curve –A PR curve is simply a graph with Precision values on the y-axis and Recall values on the x-axis. In other words, the PR curve contains TP/(TP+FN) on the y-axis and TP/(TP+FP) on the x-axis. It is important to note that Precision is also called the Positive Predictive Value (PPV).
Class imbalance.
Unlike the ROC curve, PR curves are very sensitive to imbalance. If you optimize your classifier for good AUC on an unbalanced data you are likely to obtain poor precision-recall results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With