Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the logloss negative?

I just applied the log loss in sklearn for logistic regression: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html

My code looks something like this:

def perform_cv(clf, X, Y, scoring):
    kf = KFold(X.shape[0], n_folds=5, shuffle=True)
    kf_scores = []
    for train, _ in kf:
        X_sub = X[train,:]
        Y_sub = Y[train]
        #Apply 'log_loss' as a loss function
        scores = cross_validation.cross_val_score(clf, X_sub, Y_sub, cv=5, scoring='log_loss')
        kf_scores.append(scores.mean())
    return kf_scores

However, I'm wondering why the resulting logarithmic losses are negative. I'd expect them to be positive since in the documentation (see my link above) the log loss is multiplied by a -1 in order to turn it into a positive number.

Am I doing something wrong here?

like image 507
toom Avatar asked Oct 09 '14 15:10

toom


People also ask

Why can't logarithms be negative?

Why can't logarithms be negative? — Krista King Math | Online math tutor While the value of a logarithm itself can be positive or negative, the base of the log function and the argument of the log function are a different story. The argument of a log function can only take positive arguments.

What is LogLog loss and how does it work?

Log loss (i.e. cross-entropy loss) evaluates the performance by comparing the actual class labels and the predicted probabilities. The comparison is quantified using cross-entropy.

What is log loss in Kaggle?

When it comes to a classification task, log loss is one of the most commonly used metrics. It is also known as the cross-entropy loss. If you follow or join Kaggle competitions, you will see that log loss is the predominant choice of evaluation metrics. In this post, we will see w hat makes the log loss the number one choice.

What is the log loss of being 90% sure?

For instance, -log (0.9) is equal to 0.10536 and -log (0.8) is equal to 0.22314. Thus, being 90% sure results in a lower log loss than being 80% sure. I explained the concepts of entropy, cross-entropy, and log loss in detail in a separate post if you’d like to read further.


2 Answers

Yes, this is supposed to happen. It is not a 'bug' as others have suggested. The actual log loss is simply the positive version of the number you're getting.

SK-Learn's unified scoring API always maximizes the score, so scores which need to be minimized are negated in order for the unified scoring API to work correctly. The score that is returned is therefore negated when it is a score that should be minimized and left positive if it is a score that should be maximized.

This is also described in sklearn GridSearchCV with Pipeline and in scikit-learn cross validation, negative values with mean squared error

like image 181
AN6U5 Avatar answered Oct 10 '22 18:10

AN6U5


a similar discussion can be found here.

In this way, an higher score means better performance (less loss).

like image 37
lanpa Avatar answered Oct 10 '22 20:10

lanpa