How does choosing auc, error, or logloss as the eval_metric for XGBoost impact its performance? Assume data are unbalanced. How does it impact accuracy, recall, and precision?
Choosing between different evaluation matrices doesn't directly impact the performance. Evaluation matrices are there for the user to evaluate his model. accuracy is another evaluation method, and so does precision-recall. On the other hand, Objective functions is what impacts all those evaluation matrices
For example, if one classifier is yielding a probability of 0.7 for label 1 and 0.3 for label 0, and a different classifier is yielding a probability of 0.9 for label 1 and 0.1 for label 0 you will have a different error between them, though both of them will classify the labels correctly.
Personally, most of the times, I use roc auc to evaluate a binary classification, and if I want to look deeper, I look at a confusion matrix.
When dealing with unbalanced data, one needs to know how much unbalanced, is it 30% - 70% ratio or 0.1% - 99.9% ratio? I've read an article talking about how precision recall is a better evaluation for highly unbalanced data.
Here some more reading material:
Handling highly imbalance classes and why Receiver Operating Characteristics Curve (ROC Curve) should not be used, and Precision/Recall curve should be preferred in highly imbalanced situations
ROC and precision-recall with imbalanced datasets
The only way evaluation metric can impact your model accuracy (or other different eval matrices) is when using early_stopping. early_stopping decides when to stop train additional boosters according to your evaluation metric. early_stopping was designed to prevent over-fitting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With