Why is the logloss negative?

Tags:

I just applied the log loss in sklearn for logistic regression: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html

My code looks something like this:

def perform_cv(clf, X, Y, scoring):
    kf = KFold(X.shape[0], n_folds=5, shuffle=True)
    kf_scores = []
    for train, _ in kf:
        X_sub = X[train,:]
        Y_sub = Y[train]
        #Apply 'log_loss' as a loss function
        scores = cross_validation.cross_val_score(clf, X_sub, Y_sub, cv=5, scoring='log_loss')
        kf_scores.append(scores.mean())
    return kf_scores

However, I'm wondering why the resulting logarithmic losses are negative. I'd expect them to be positive since in the documentation (see my link above) the log loss is multiplied by a -1 in order to turn it into a positive number.

Am I doing something wrong here?

507

asked Oct 09 '14 15:10

toom

2 Answers

Yes, this is supposed to happen. It is not a 'bug' as others have suggested. The actual log loss is simply the positive version of the number you're getting.

SK-Learn's unified scoring API always maximizes the score, so scores which need to be minimized are negated in order for the unified scoring API to work correctly. The score that is returned is therefore negated when it is a score that should be minimized and left positive if it is a score that should be maximized.

This is also described in sklearn GridSearchCV with Pipeline and in scikit-learn cross validation, negative values with mean squared error

181

answered Oct 10 '22 18:10

AN6U5

a similar discussion can be found here.

In this way, an higher score means better performance (less loss).

answered Oct 10 '22 20:10

lanpa

Related questions
                            
                                Resampling in scikit-learn and/or pandas
                            
                                Use a metric after a classifier in a Pipeline
                            
                                How do you visualize a ward tree from sklearn.cluster.ward_tree?
                            
                                How to get the first canonical correlation from sklearn's CCA module?
                            
                                Spark Multi Label classification
                            
                                Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)
                            
                                Setting feature weights for KNN
                            
                                Should binary features be one-hot encoded?
                            
                                Can I add outlier detection and removal to Scikit learn Pipeline?
                            
                                PCA memory error in Sklearn: Alternative Dim Reduction?
                            
                                MiniBatchKMeans Parameters
                            
                                sklearn: User defined cross validation for time series data
                            
                                StratifiedKFold vs StratifiedShuffleSplit vs StratifiedKFold + Shuffle
                            
                                Evaluating Logistic regression with cross validation
                            
                                What is the output of clf.tree_.feature?
                            
                                Dummy creation in pipeline with different levels in train and test set
                            
                                How does sp_randint work?
                            
                                Scikit-learn Agglomerative Clustering Connectivity Matrix
                            
                                Distances between rankings
                            
                                Exhaustively feature selection in scikit-learn?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is the logloss negative?

Tags:

metric

scikit-learn

loss

toom

People also ask

2 Answers

AN6U5

lanpa

Recent Activity

Donate For Us