I prepared several models for binary classification of documents in the fraud field. I calculated the log loss for all models. I thought it was essentially measuring the confidence of the predictions and that log loss should be in the range of [0-1]. I believe it is an important measure in classification when the outcome - determining the class is not sufficient for evaluation purposes. So if two models have acc, recall and precision that are quite close but one has a lower log-loss function it should be selected given there are no other parameters/metrics (such as time, cost) in the decision process.
The log loss for the decision tree is 1.57, for all other models it is in the 0-1 range. How do I interpret this score?
For simplicity, let's assume the value of a pixel can go from 0 to 255. We use the Mean Squared Error as our cost function. In this example, having a loss of 1 would be tiny, while a loss of 100 would be really high: However, whether the loss is high or low is not the most important inference we can learn from it.
Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value (0 or 1 in case of binary classification). The more the predicted probability diverges from the actual value, the higher is the log-loss value.
The logloss is simply L(pi)=−log(pi) where p is simply the probability attributed to the real class. So L(p)=0 is good, we attributed the probability 1 to the right class, while L(p)=+∞ is bad, because we attributed the probability 0 to the actual class.
For any given problem, a lower log loss value means better predictions. Mathematical interpretation: Log Loss is the negative average of the log of corrected predicted probabilities for each instance.
It's important to remember log loss does not have an upper bound. Log loss exists on the range [0, ∞)
From Kaggle we can find a formula for log loss.
In which yij is 1 for the correct class and 0 for other classes and pij is the probability assigned for that class.
If we look at the case where the average log loss exceeds 1, it is when log(pij) < -1 when i is the true class. This means that the predicted probability for that given class would be less than exp(-1) or around 0.368. So, seeing a log loss greater than one can be expected in the case that your model only gives less than a 36% probability estimate for the actual class.
We can also see this by plotting the log loss given various probability estimates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With