log loss output is greater than 1

Tags:

I prepared several models for binary classification of documents in the fraud field. I calculated the log loss for all models. I thought it was essentially measuring the confidence of the predictions and that log loss should be in the range of [0-1]. I believe it is an important measure in classification when the outcome - determining the class is not sufficient for evaluation purposes. So if two models have acc, recall and precision that are quite close but one has a lower log-loss function it should be selected given there are no other parameters/metrics (such as time, cost) in the decision process.

The log loss for the decision tree is 1.57, for all other models it is in the 0-1 range. How do I interpret this score?

279

asked Jan 26 '16 12:01

OAK

1 Answers

It's important to remember log loss does not have an upper bound. Log loss exists on the range [0, ∞)

From Kaggle we can find a formula for log loss.

Log Loss

In which y_ij is 1 for the correct class and 0 for other classes and p_ij is the probability assigned for that class.

If we look at the case where the average log loss exceeds 1, it is when log(p_ij) < -1 when i is the true class. This means that the predicted probability for that given class would be less than exp(-1) or around 0.368. So, seeing a log loss greater than one can be expected in the case that your model only gives less than a 36% probability estimate for the actual class.

We can also see this by plotting the log loss given various probability estimates.

Log Loss curve

111

answered Oct 12 '22 19:10

David Maust

Related questions
                            
                                Difference between OpenAI Gym environments 'CartPole-v0' and 'CartPole-v1'
                            
                                how to split a dataset into training and validation set keeping ratio between classes?
                            
                                How to explore a decision tree built using scikit learn
                            
                                TensorFlow TypeError: Value passed to parameter input has DataType uint8 not in list of allowed values: float16, float32
                            
                                Keras + TensorFlow Realtime training chart
                            
                                Neural networks for email spam detection
                            
                                Cross Validation in Keras
                            
                                Naive Bayes vs. SVM for classifying text data
                            
                                ValueError: x and y must be the same size
                            
                                conversion of pandas dataframe to h2o frame efficiently
                            
                                Issue in training hidden markov model and usage for classification
                            
                                confusion matrix error "Classification metrics can't handle a mix of multilabel-indicator and multiclass targets"
                            
                                How to compute the cosine_similarity in pytorch for all rows in a matrix with respect to all rows in another matrix
                            
                                Finding K-nearest neighbors and its implementation
                            
                                Ensemble of different kinds of regressors using scikit-learn (or any other python framework)
                            
                                Download link for Ta Feng Grocery dataset [closed]
                            
                                SVM equations from e1071 R package?
                            
                                Can the value of information gain be negative? [closed]
                            
                                Right function for normalizing input of sklearn SVM
                            
                                Scikit classification report - change the format of displayed results

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

log loss output is greater than 1

Tags:

machine-learning

metric

scikit-learn

loss

OAK

People also ask

1 Answers

David Maust

Recent Activity

Donate For Us