I understand that PyTorch's LogSoftmax function is basically just a more numerically stable way to compute <code>Log(Softmax(x))</code>. Softmax lets you convert the output from a Linear layer into a categorical probability distribution. The pytorch documentation says that CrossEntropyLoss combines <code>nn.LogSoftmax()</code> and <code>nn.NLLLoss()</code> in one single class. Looking at <code>NLLLoss</code>, I'm still confused...Are there 2 logs being used? I think of negative log as information content of an event. (As in entropy) After a bit more looking, I think that <code>NLLLoss</code> assumes that you're actually passing in log probabilities instead of just probabilities. Is this correct? It's kind of weird if so...

Yes, <code>NLLLoss</code> takes log-probabilities (<code>log(softmax(x))</code>) as input. Why?. Because if you add a <code>nn.LogSoftmax</code> (or <code>F.log_softmax</code>) as the final layer of your model's output, you can easily get the probabilities using <code>torch.exp(output)</code>, and in order to get cross-entropy loss, you can directly use <code>nn.NLLLoss</code>. Of course, log-softmax is more stable as you said. And, there is only one log (it's in <code>nn.LogSoftmax</code>). There is no log in <code>nn.NLLLoss</code>. <code>nn.CrossEntropyLoss()</code> combines <code>nn.LogSoftmax()</code> (<code>log(softmax(x))</code>) and <code>nn.NLLLoss()</code> in one single class. Therefore, the output from the network that is passed into <code>nn.CrossEntropyLoss</code> needs to be the raw output of the network (called logits), not the output of the softmax function.

PyTorch LogSoftmax vs Softmax for CrossEntropyLoss

Tags:

cross-entropy

pytorch

I understand that PyTorch's LogSoftmax function is basically just a more numerically stable way to compute Log(Softmax(x)). Softmax lets you convert the output from a Linear layer into a categorical probability distribution.

The pytorch documentation says that CrossEntropyLoss combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

Looking at NLLLoss, I'm still confused...Are there 2 logs being used? I think of negative log as information content of an event. (As in entropy)

After a bit more looking, I think that NLLLoss assumes that you're actually passing in log probabilities instead of just probabilities. Is this correct? It's kind of weird if so...

387

asked Dec 08 '20 03:12

JacKeown

1 Answers

Yes, NLLLoss takes log-probabilities (log(softmax(x))) as input. Why?. Because if you add a nn.LogSoftmax (or F.log_softmax) as the final layer of your model's output, you can easily get the probabilities using torch.exp(output), and in order to get cross-entropy loss, you can directly use nn.NLLLoss. Of course, log-softmax is more stable as you said.

And, there is only one log (it's in nn.LogSoftmax). There is no log in nn.NLLLoss.

nn.CrossEntropyLoss() combines nn.LogSoftmax() (log(softmax(x))) and nn.NLLLoss() in one single class. Therefore, the output from the network that is passed into nn.CrossEntropyLoss needs to be the raw output of the network (called logits), not the output of the softmax function.

129

answered Sep 29 '22 12:09

kHarshit

Related questions
                            
                                Get probability of multi-token word in MASK position
                            
                                How to get quick documentation working with PyCharm and Pytorch
                            
                                Understanding when to call zero_grad() in pytorch, when training with multiple losses
                            
                                How do I list all currently available GPUs with pytorch?
                            
                                Difference between 'ctx' and 'self' in python?
                            
                                Comparing Conv2D with padding between Tensorflow and PyTorch
                            
                                How to create a torchtext.data.TabularDataset directly from a list or dict
                            
                                Size mismatch for fc.bias and fc.weight in PyTorch
                            
                                Are there any computational efficiency differences between nn.functional() Vs nn.sequential() in PyTorch
                            
                                How to specify pytorch / cuda version in pipenv
                            
                                Tracing back deprecated warning in pytorch
                            
                                Tokens to Words mapping in the tokenizer decode step huggingface?
                            
                                pytorch embedding index out of range
                            
                                How to calculate unbalanced weights for BCEWithLogitsLoss in pytorch
                            
                                How to load the saved tokenizer from pretrained model
                            
                                How to add parameters in module class in pytorch custom model?
                            
                                How can we convert a .pth model into .pb file?
                            
                                PyTorch RuntimeError: DataLoader worker (pid(s) 15332) exited unexpectedly
                            
                                How to read from a high IO dataset in pytorch which grows from epoch to epoch
                            
                                What is tape-based autograd in Pytorch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With