I'm a bit confused by the cross entropy loss in PyTorch.
Considering this example:
import torch import torch.nn as nn from torch.autograd import Variable output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1) target = Variable(torch.LongTensor([3])) criterion = nn.CrossEntropyLoss() loss = criterion(output, target) print(loss)
I would expect the loss to be 0. But I get:
Variable containing: 0.7437 [torch.FloatTensor of size 1]
As far as I know cross entropy can be calculated like this:
But shouldn't be the result then 1*log(1) = 0 ?
I tried different inputs like one-hot encodings, but this doesn't work at all, so it seems the input shape of the loss function is okay.
I would be really grateful if someone could help me out and tell me where my mistake is.
Thanks in advance!
This criterion computes the cross entropy loss between input and target. It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.
The average number of bits required to send a message from distribution A to distribution B is referred to as cross-entropy. Cross entropy is a concept used in machine learning when algorithms are created to predict from the model. The construction of the model is based on a comparison of actual and expected results.
Cross entropy can be used to define a loss function (cost function) in machine learning and optimization. It is defined on probability distributions, not single values. It works for classification because classifier output is (often) a probability distribution over class labels.
Cross entropy loss is a metric used to measure how well a classification model in machine learning performs. The loss (or error) is measured as a number between 0 and 1, with 0 being a perfect model. The goal is generally to get your model as close to 0 as possible.
In your example you are treating output [0, 0, 0, 1]
as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to 1
, and need to be first converted into probabilities for which it uses the softmax function.
So H(p, q)
becomes:
H(p, softmax(output))
Translating the output [0, 0, 0, 1]
into probabilities:
softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754]
whence:
-log(0.4754) = 0.7437
Your understanding is correct but pytorch doesn't compute cross entropy in that way. Pytorch uses the following formula.
loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j]))) = -x[class] + log(\sum_j exp(x[j]))
Since, in your scenario, x = [0, 0, 0, 1]
and class = 3
, if you evaluate the above expression, you would get:
loss(x, class) = -1 + log(exp(0) + exp(0) + exp(0) + exp(1)) = 0.7437
Pytorch considers natural logarithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With