Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my implementations of the log-loss (or cross-entropy) not producing the same results?

I was reading up on log-loss and cross-entropy, and it seems like there are 2 approaches for calculating it, based on the following equations.

enter image description here

The first one is the following.

import numpy as np
from sklearn.metrics import log_loss


def cross_entropy(predictions, targets):
    N = predictions.shape[0]
    ce = -np.sum(targets * np.log(predictions)) / N
    return ce


predictions = np.array([[0.25,0.25,0.25,0.25],
                        [0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
                   [0,0,0,1]])

x = cross_entropy(predictions, targets)
print(log_loss(targets, predictions), 'our_answer:', ans)

The output of the previous program is 0.7083767843022996 our_answer: 0.71355817782, which is almost the same. So that's not the issue.

The above implementation is the middle part of the equation above.

The second approach is based on the RHS part of the equation above.

res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
    for class_act, class_pred in zip(act_row, pred_row):
        res += - class_act * np.log(class_pred) - (1-class_act) * np.log(1-class_pred)

print(res/len(targets))

And the output is 1.1549753967602232, which is not quite the same.

I have tried the same implementation with NumPy, but it also didn't work. What am I doing wrong?

PS: I am also curious that -y log (y_hat) seems to me that it's same as - sigma(p_i * log( q_i)) then how come there is a -(1-y) log(1-y_hat) part. Clearly I am misunderstanding how -y log (y_hat) is to be calculated.

like image 592
Vikash Singh Avatar asked Mar 25 '18 07:03

Vikash Singh


People also ask

Is Cross loss and log entropy loss the same?

Log Loss (Binary Cross-Entropy Loss): A loss function that represents how much the predicted probabilities deviate from the true ones. It is used in binary cases. Cross-Entropy Loss: A generalized form of the log loss, which is used for multi-class classification problems.

In what way does cross entropy loss interpret the scores?

Also called logarithmic loss, log loss or logistic loss. Each predicted class probability is compared to the actual class desired output 0 or 1 and a score/loss is calculated that penalizes the probability based on how far it is from the actual expected value.

Is log loss binary cross entropy?

In this article, we will specifically focus on Binary Cross Entropy also known as Log loss, it is the most common loss function used for binary classification problems.

Is cross entropy same as negative log likelihood?

Mathematically, the negative log likelihood and the cross entropy have the same equation. KL divergence provides another perspective in optimizing a model. However, even they uses different formula, they both end up with the same solution. Cross entropy is one common objective function in deep learning.


1 Answers

I cannot reproduce the difference in the results you report in the first part (you also refer to an ans variable, which you do not seem to define, I guess it is x):

import numpy as np
from sklearn.metrics import log_loss


def cross_entropy(predictions, targets):
    N = predictions.shape[0]
    ce = -np.sum(targets * np.log(predictions)) / N
    return ce

predictions = np.array([[0.25,0.25,0.25,0.25],
                        [0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
                   [0,0,0,1]])

The results:

cross_entropy(predictions, targets)
# 0.7083767843022996

log_loss(targets, predictions)
# 0.7083767843022996

log_loss(targets, predictions) == cross_entropy(predictions, targets)
# True

Your cross_entropy function seems to work fine.

Regarding the second part:

Clearly I am misunderstanding how -y log (y_hat) is to be calculated.

Indeed, reading more carefully the fast.ai wiki you have linked to, you'll see that the RHS of the equation holds only for binary classification (where always one of y and 1-y will be zero), which is not the case here - you have a 4-class multinomial classification. So, the correct formulation is

res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
    for class_act, class_pred in zip(act_row, pred_row):
        res += - class_act * np.log(class_pred)

i.e. discarding the subtraction of (1-class_act) * np.log(1-class_pred).

Result:

res/len(targets)
# 0.7083767843022996

res/len(targets) == log_loss(targets, predictions)
# True

On a more general level (the mechanics of log loss & accuracy for binary classification), you may find this answer useful.

like image 196
desertnaut Avatar answered Sep 21 '22 03:09

desertnaut