I'm a bit confused by the cross entropy loss in PyTorch. Considering this example: <pre class="prettyprint lang-python prettyprint-override"><code>import torch import torch.nn as nn from torch.autograd import Variable output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1) target = Variable(torch.LongTensor([3])) criterion = nn.CrossEntropyLoss() loss = criterion(output, target) print(loss) </code></pre> I would expect the loss to be 0. But I get: <pre class="prettyprint lang-python prettyprint-override"><code>Variable containing: 0.7437 [torch.FloatTensor of size 1] </code></pre> As far as I know cross entropy can be calculated like this: <img src="https://i.stack.imgur.com/W3xm0.gif" alt="enter image description here"> But shouldn't be the result then 1*log(1) = 0 ? I tried different inputs like one-hot encodings, but this doesn't work at all, so it seems the input shape of the loss function is okay. I would be really grateful if someone could help me out and tell me where my mistake is. Thanks in advance!

In your example you are treating output <code>[0, 0, 0, 1]</code> as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to <code>1</code>, and need to be first converted into probabilities for which it uses the softmax function. So <code>H(p, q)</code> becomes: <pre class="prettyprint"><code>H(p, softmax(output)) </code></pre> Translating the output <code>[0, 0, 0, 1]</code> into probabilities: <pre class="prettyprint"><code>softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754] </code></pre> whence: <pre class="prettyprint"><code>-log(0.4754) = 0.7437 </code></pre>

Your understanding is correct but pytorch doesn't compute cross entropy in that way. Pytorch uses the following formula. <pre class="prettyprint"><code>loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j]))) = -x[class] + log(\sum_j exp(x[j])) </code></pre> Since, in your scenario, <code>x = [0, 0, 0, 1]</code> and <code>class = 3</code>, if you evaluate the above expression, you would get: <pre class="prettyprint"><code>loss(x, class) = -1 + log(exp(0) + exp(0) + exp(0) + exp(1)) = 0.7437 </code></pre> Pytorch considers natural logarithm.

Cross Entropy in PyTorch

I'm a bit confused by the cross entropy loss in PyTorch.

Considering this example:

import torch import torch.nn as nn from torch.autograd import Variable  output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1) target = Variable(torch.LongTensor([3]))  criterion = nn.CrossEntropyLoss() loss = criterion(output, target) print(loss)

I would expect the loss to be 0. But I get:

Variable containing:  0.7437 [torch.FloatTensor of size 1]

As far as I know cross entropy can be calculated like this:

enter image description here

But shouldn't be the result then 1*log(1) = 0 ?

I tried different inputs like one-hot encodings, but this doesn't work at all, so it seems the input shape of the loss function is okay.

I would be really grateful if someone could help me out and tell me where my mistake is.

Thanks in advance!

What is cross entropy in PyTorch?

This criterion computes the cross entropy loss between input and target. It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

What is cross entropy in machine learning?

The average number of bits required to send a message from distribution A to distribution B is referred to as cross-entropy. Cross entropy is a concept used in machine learning when algorithms are created to predict from the model. The construction of the model is based on a comparison of actual and expected results.

What is cross entropy in Tensorflow?

Cross entropy can be used to define a loss function (cost function) in machine learning and optimization. It is defined on probability distributions, not single values. It works for classification because classifier output is (often) a probability distribution over class labels.

What is cross entropy loss used for?

Cross entropy loss is a metric used to measure how well a classification model in machine learning performs. The loss (or error) is measured as a number between 0 and 1, with 0 being a perfect model. The goal is generally to get your model as close to 0 as possible.

In your example you are treating output [0, 0, 0, 1] as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to 1, and need to be first converted into probabilities for which it uses the softmax function.

So H(p, q) becomes:

H(p, softmax(output))

Translating the output [0, 0, 0, 1] into probabilities:

softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754]

whence:

-log(0.4754) = 0.7437

Your understanding is correct but pytorch doesn't compute cross entropy in that way. Pytorch uses the following formula.

loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))                = -x[class] + log(\sum_j exp(x[j]))

Since, in your scenario, x = [0, 0, 0, 1] and class = 3, if you evaluate the above expression, you would get:

loss(x, class) = -1 + log(exp(0) + exp(0) + exp(0) + exp(1))                = 0.7437

Pytorch considers natural logarithm.

Cross Entropy in PyTorch

Tags:

python

machine-learning

pytorch

loss

MBT

People also ask

2 Answers

Old Dog

Wasi Ahmad

Recent Activity

Donate For Us

Cross Entropy in PyTorch

Tags:

python

machine-learning

pytorch

loss

MBT

People also ask

2 Answers

Old Dog

Wasi Ahmad

Related questions

Recent Activity

Donate For Us