Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn metrics.log_loss is positive vs. scoring 'neg_log_loss' is negative

Making sure I am getting this right:

If we use sklearn.metrics.log_loss standalone, i.e. log_loss(y_true,y_pred), it generates a positive score -- the smaller the score, the better the performance.

However, if we use 'neg_log_loss' as a scoring scheme as in 'cross_val_score", the score is negative -- the bigger the score, the better the performance.

And this is due to the scoring scheme is built to be consistent with other scoring schemes. Since generally, the higher the better, we negate usual log_loss to be consistent with the trend. And it is done so solely for that purpose. Is this understanding correct?

[Background: got positive scores for metric.log_loss, and negative scores for 'neg_los_loss', and both refer to the same documentation page.]

like image 457
Max Avatar asked Mar 28 '17 22:03

Max


1 Answers

The sklearn.metrics.log_loss is an implementation of the error metric as typically defined, and which is as most error metrics a positive number. In this case, it is a metric which is generally minimized (e.g. as mean squared error for regression), in contrast to metrics such as accuracy which is maximized.

The neg_log_loss is hence a technicality to create a utility value, which allows optimizing functions and classes of sklearn to maximize this utility without having to change the function's behavior for each metric (such include for instance named cross_val_score, GridSearchCV, RandomizedSearchCV, and others).

like image 175
Marcus V. Avatar answered Jan 05 '23 04:01

Marcus V.