Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use softmax as output when using cross entropy loss in pytorch?

Tags:

I have a problem with classifying fully connected deep neural net with 2 hidden layers for MNIST dataset in pytorch.

I want to use tanh as activations in both hidden layers, but in the end, I should use softmax.

For the loss, I am choosing nn.CrossEntropyLoss() in PyTOrch, which (as I have found out) does not want to take one-hot encoded labels as true labels, but takes LongTensor of classes instead.

My model is nn.Sequential() and when I am using softmax in the end, it gives me worse results in terms of accuracy on testing data. Why?

import torch from torch import nn  inputs, n_hidden0, n_hidden1, out = 784, 128, 64, 10 n_epochs = 500 model = nn.Sequential(     nn.Linear(inputs, n_hidden0, bias=True),      nn.Tanh(),     nn.Linear(n_hidden0, n_hidden1, bias=True),     nn.Tanh(),     nn.Linear(n_hidden1, out, bias=True),     nn.Softmax()  # SHOULD THIS BE THERE? )                   criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.5)  for epoch in range(n_epochs):     y_pred = model(X_train)     loss = criterion(y_pred, Y_train)     print('epoch: ', epoch+1,' loss: ', loss.item())     optimizer.zero_grad()     loss.backward()     optimizer.step() 
like image 797
pikachu Avatar asked Apr 14 '19 12:04

pikachu


People also ask

Does PyTorch cross entropy loss apply softmax?

Here we have to be careful because the cross-entropy loss already applies the LogSoftmax and then the negative log-likelihood(nn. LogSoftmax+nn. NLLLoss). We must not implement the softmax layer for ourselves.

Does cross entropy loss use softmax?

Categorical cross-entropy loss is closely related to the softmax function, since it's practically only used with networks with a softmax layer at the output.

Should I use softmax before cross entropy?

When you have a double softmax in the output layer, you basically change the output function in such way that it changes the gradients that are propagated to your network. The softmax with cross entropy is a preferred loss function due to the gradients it produces.

Is softmax loss same as cross entropy loss?

In short, Softmax Loss is actually just a Softmax Activation plus a Cross-Entropy Loss. Softmax is an activation function that outputs the probability for each class and these probabilities will sum up to one. Cross Entropy loss is just the sum of the negative logarithm of the probabilities.


Video Answer


1 Answers

As stated in the torch.nn.CrossEntropyLoss() doc:

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

Therefore, you should not use softmax before.

like image 182
Berriel Avatar answered Sep 21 '22 06:09

Berriel