I was reading up on log-loss and cross-entropy, and it seems like there are 2 approaches for calculating it, based on the following equations. <img src="https://i.stack.imgur.com/8NTbK.png" alt="enter image description here"> The first one is the following. <pre class="prettyprint"><code>import numpy as np from sklearn.metrics import log_loss def cross_entropy(predictions, targets): N = predictions.shape[0] ce = -np.sum(targets * np.log(predictions)) / N return ce predictions = np.array([[0.25,0.25,0.25,0.25], [0.01,0.01,0.01,0.97]]) targets = np.array([[1,0,0,0], [0,0,0,1]]) x = cross_entropy(predictions, targets) print(log_loss(targets, predictions), 'our_answer:', ans) </code></pre> The output of the previous program is <code>0.7083767843022996 our_answer: 0.71355817782</code>, which is almost the same. So that's not the issue. The above implementation is the middle part of the equation above. The second approach is based on the RHS part of the equation above. <pre class="prettyprint"><code>res = 0 for act_row, pred_row in zip(targets, np.array(predictions)): for class_act, class_pred in zip(act_row, pred_row): res += - class_act * np.log(class_pred) - (1-class_act) * np.log(1-class_pred) print(res/len(targets)) </code></pre> And the output is <code>1.1549753967602232</code>, which is not quite the same. I have tried the same implementation with NumPy, but it also didn't work. What am I doing wrong? PS: I am also curious that <code>-y log (y_hat)</code> seems to me that it's same as <code>- sigma(p_i * log( q_i))</code> then how come there is a <code>-(1-y) log(1-y_hat)</code> part. Clearly I am misunderstanding how <code>-y log (y_hat)</code> is to be calculated.

I cannot reproduce the difference in the results you report in the first part (you also refer to an <code>ans</code> variable, which you do not seem to define, I guess it is <code>x</code>): <pre class="prettyprint"><code>import numpy as np from sklearn.metrics import log_loss def cross_entropy(predictions, targets): N = predictions.shape[0] ce = -np.sum(targets * np.log(predictions)) / N return ce predictions = np.array([[0.25,0.25,0.25,0.25], [0.01,0.01,0.01,0.97]]) targets = np.array([[1,0,0,0], [0,0,0,1]]) </code></pre> The results: <pre class="prettyprint"><code>cross_entropy(predictions, targets) # 0.7083767843022996 log_loss(targets, predictions) # 0.7083767843022996 log_loss(targets, predictions) == cross_entropy(predictions, targets) # True </code></pre> Your <code>cross_entropy</code> function seems to work fine. Regarding the second part: <blockquote> Clearly I am misunderstanding how <code>-y log (y_hat)</code> is to be calculated. </blockquote> Indeed, reading more carefully the fast.ai wiki you have linked to, you'll see that the RHS of the equation holds only for binary classification (where always one of <code>y</code> and <code>1-y</code> will be zero), which is not the case here - you have a 4-class multinomial classification. So, the correct formulation is <pre class="prettyprint"><code>res = 0 for act_row, pred_row in zip(targets, np.array(predictions)): for class_act, class_pred in zip(act_row, pred_row): res += - class_act * np.log(class_pred) </code></pre> i.e. discarding the subtraction of <code>(1-class_act) * np.log(1-class_pred)</code>. Result: <pre class="prettyprint"><code>res/len(targets) # 0.7083767843022996 res/len(targets) == log_loss(targets, predictions) # True </code></pre> On a more general level (the mechanics of log loss & accuracy for binary classification), you may find this answer useful.

Why is my implementations of the log-loss (or cross-entropy) not producing the same results?

Tags:

python

machine-learning

cross-entropy

metrics

scikit-learn

I was reading up on log-loss and cross-entropy, and it seems like there are 2 approaches for calculating it, based on the following equations.

enter image description here

The first one is the following.

import numpy as np
from sklearn.metrics import log_loss


def cross_entropy(predictions, targets):
    N = predictions.shape[0]
    ce = -np.sum(targets * np.log(predictions)) / N
    return ce


predictions = np.array([[0.25,0.25,0.25,0.25],
                        [0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
                   [0,0,0,1]])

x = cross_entropy(predictions, targets)
print(log_loss(targets, predictions), 'our_answer:', ans)

The output of the previous program is 0.7083767843022996 our_answer: 0.71355817782, which is almost the same. So that's not the issue.

The above implementation is the middle part of the equation above.

The second approach is based on the RHS part of the equation above.

res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
    for class_act, class_pred in zip(act_row, pred_row):
        res += - class_act * np.log(class_pred) - (1-class_act) * np.log(1-class_pred)

print(res/len(targets))

And the output is 1.1549753967602232, which is not quite the same.

I have tried the same implementation with NumPy, but it also didn't work. What am I doing wrong?

PS: I am also curious that -y log (y_hat) seems to me that it's same as - sigma(p_i * log( q_i)) then how come there is a -(1-y) log(1-y_hat) part. Clearly I am misunderstanding how -y log (y_hat) is to be calculated.

592

asked Mar 25 '18 07:03

Vikash Singh

1 Answers

I cannot reproduce the difference in the results you report in the first part (you also refer to an ans variable, which you do not seem to define, I guess it is x):

import numpy as np
from sklearn.metrics import log_loss


def cross_entropy(predictions, targets):
    N = predictions.shape[0]
    ce = -np.sum(targets * np.log(predictions)) / N
    return ce

predictions = np.array([[0.25,0.25,0.25,0.25],
                        [0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
                   [0,0,0,1]])

The results:

cross_entropy(predictions, targets)
# 0.7083767843022996

log_loss(targets, predictions)
# 0.7083767843022996

log_loss(targets, predictions) == cross_entropy(predictions, targets)
# True

Your cross_entropy function seems to work fine.

Regarding the second part:

Clearly I am misunderstanding how -y log (y_hat) is to be calculated.

Indeed, reading more carefully the fast.ai wiki you have linked to, you'll see that the RHS of the equation holds only for binary classification (where always one of y and 1-y will be zero), which is not the case here - you have a 4-class multinomial classification. So, the correct formulation is

res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
    for class_act, class_pred in zip(act_row, pred_row):
        res += - class_act * np.log(class_pred)

i.e. discarding the subtraction of (1-class_act) * np.log(1-class_pred).

Result:

res/len(targets)
# 0.7083767843022996

res/len(targets) == log_loss(targets, predictions)
# True

On a more general level (the mechanics of log loss & accuracy for binary classification), you may find this answer useful.

196

answered Sep 21 '22 03:09

desertnaut

Related questions
                            
                                Group rows by overlapping ranges
                            
                                Three different types of output when reading an image with three different libraries in Python
                            
                                Subtract each row of matrix A from every row of matrix B without loops
                            
                                basemap ImportError: No module named 'mpl_toolkits.basemap'
                            
                                Flask validate_on_submit always False
                            
                                What are the "parts" in a multipart email?
                            
                                Detect when multiprocessing queue is empty and closed
                            
                                Python: Raise square matrix to negative half power
                            
                                Can I get the shape of a numpy save file without reading the entire contents (e.g. memmap)
                            
                                Move mouse cursor to second monitor using pyautogui
                            
                                Hide command prompt in Selenium ChromeDriver
                            
                                Mock method which returns same value passed as argument
                            
                                Difference between slash operator and comma separator in pathlib Path
                            
                                How to pivot one column containing strings in a dataframe? [duplicate]
                            
                                How to assign values to multiple non existing columns in a pandas dataframe?
                            
                                Generating SIMD instructions from Cython code
                            
                                Pandas get first and last value of column from group
                            
                                What are the uses of tf.space_to_depth?
                            
                                What exactly are the csv module's Dialect settings for excel-tab?
                            
                                Pandas groupby on a column of lists

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With