Making sure I am getting this right:
If we use sklearn.metrics.log_loss standalone, i.e. log_loss(y_true,y_pred), it generates a positive score -- the smaller the score, the better the performance.
However, if we use 'neg_log_loss' as a scoring scheme as in 'cross_val_score", the score is negative -- the bigger the score, the better the performance.
And this is due to the scoring scheme is built to be consistent with other scoring schemes. Since generally, the higher the better, we negate usual log_loss to be consistent with the trend. And it is done so solely for that purpose. Is this understanding correct?
[Background: got positive scores for metric.log_loss, and negative scores for 'neg_los_loss', and both refer to the same documentation page.]
The sklearn.metrics.log_loss
is an implementation of the error metric as typically defined, and which is as most error metrics a positive number. In this case, it is a metric which is generally minimized (e.g. as mean squared error for regression), in contrast to metrics such as accuracy which is maximized.
The neg_log_loss
is hence a technicality to create a utility value, which allows optimizing functions and classes of sklearn to maximize this utility without having to change the function's behavior for each metric (such include for instance named cross_val_score
, GridSearchCV
, RandomizedSearchCV
, and others).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With