I have a problem with classifying fully connected deep neural net with 2 hidden layers for MNIST dataset in pytorch. I want to use tanh as activations in both hidden layers, but in the end, I should use softmax. For the loss, I am choosing <code>nn.CrossEntropyLoss()</code> in PyTOrch, which (as I have found out) does not want to take one-hot encoded labels as true labels, but takes LongTensor of classes instead. My model is <code>nn.Sequential()</code> and when I am using softmax in the end, it gives me worse results in terms of accuracy on testing data. Why? <pre class="prettyprint lang-py prettyprint-override"><code>import torch from torch import nn inputs, n_hidden0, n_hidden1, out = 784, 128, 64, 10 n_epochs = 500 model = nn.Sequential( nn.Linear(inputs, n_hidden0, bias=True), nn.Tanh(), nn.Linear(n_hidden0, n_hidden1, bias=True), nn.Tanh(), nn.Linear(n_hidden1, out, bias=True), nn.Softmax() # SHOULD THIS BE THERE? ) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.5) for epoch in range(n_epochs): y_pred = model(X_train) loss = criterion(y_pred, Y_train) print('epoch: ', epoch+1,' loss: ', loss.item()) optimizer.zero_grad() loss.backward() optimizer.step() </code></pre>

As stated in the <code>torch.nn.CrossEntropyLoss()</code> doc: <blockquote> This criterion combines <code>nn.LogSoftmax()</code> and <code>nn.NLLLoss()</code> in one single class. </blockquote> Therefore, you should not use softmax before.

Should I use softmax as output when using cross entropy loss in pytorch?

Tags:

I have a problem with classifying fully connected deep neural net with 2 hidden layers for MNIST dataset in pytorch.

I want to use tanh as activations in both hidden layers, but in the end, I should use softmax.

For the loss, I am choosing nn.CrossEntropyLoss() in PyTOrch, which (as I have found out) does not want to take one-hot encoded labels as true labels, but takes LongTensor of classes instead.

My model is nn.Sequential() and when I am using softmax in the end, it gives me worse results in terms of accuracy on testing data. Why?

import torch from torch import nn  inputs, n_hidden0, n_hidden1, out = 784, 128, 64, 10 n_epochs = 500 model = nn.Sequential(     nn.Linear(inputs, n_hidden0, bias=True),      nn.Tanh(),     nn.Linear(n_hidden0, n_hidden1, bias=True),     nn.Tanh(),     nn.Linear(n_hidden1, out, bias=True),     nn.Softmax()  # SHOULD THIS BE THERE? )                   criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.5)  for epoch in range(n_epochs):     y_pred = model(X_train)     loss = criterion(y_pred, Y_train)     print('epoch: ', epoch+1,' loss: ', loss.item())     optimizer.zero_grad()     loss.backward()     optimizer.step()

797

asked Apr 14 '19 12:04

pikachu

Video Answer

1 Answers

As stated in the torch.nn.CrossEntropyLoss() doc:

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

Therefore, you should not use softmax before.

182

answered Sep 21 '22 06:09

Berriel

Related questions
                            
                                How to auto-reject a pull request if tests are failing (Github actions)
                            
                                Why does the Count() method use the "checked" keyword?
                            
                                Deleting the middle element of a list
                            
                                Xdebug Could not connect to debugging client. Tried: localhost:9000
                            
                                What is the easiest way to add compression to WCF in Silverlight?
                            
                                What kind of database refactoring tools are there?
                            
                                NHibernate SetTimeout on ICriteria
                            
                                Can I use SQL Server Management Studio 2005 for 2008 DB?
                            
                                Colour blindness simulator
                            
                                How does Google Reader get every item in an RSS feed?
                            
                                Deploying a newly developed Eclipse Plugin
                            
                                C# COM DLL: do I use Regasm, or Regsvr32?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With