Does pytorch apply softmax automatically in nn.Linear

Tags:

In pytorch a classification network model is defined as this,

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.out = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.out(x)
        return x

Is softmax applied here? In my understanding, things should be like,

Click to copy

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.relu =  torch.nn.ReLu(inplace=True)
        self.out = torch.nn.Linear(n_hidden, n_output)   # output layer
        self.softmax = torch.nn.Softmax(dim=n_output)
    def forward(self, x):
        x = self.hidden(x)      # activation function for hidden layer
        x = self.relu(x)
        x = self.out(x)
        x = self.softmax(x)
        return x

I understand that F.relu(self.relu(x)) is also applying relu, but the first block of code doesn't apply softmax, right?

574

asked Aug 15 '19 20:08

yujuezhao

1 Answers

Latching on to what @jodag was already saying in his comment, and extending it a bit to form a full answer:

No, PyTorch does not automatically apply softmax, and you can at any point apply torch.nn.Softmax() as you want. But, softmax has some issues with numerical stability, which we want to avoid as much as we can. One solution is to use log-softmax, but this tends to be slower than a direct computation.

Especially when we are using Negative Log Likelihood as a loss function (in PyTorch, this is torch.nn.NLLLoss, we can utilize the fact that the derivative of (log-)softmax+NLLL is actually mathematically quite nice and simple, which is why it makes sense to combine the both into a single function/element. The result is then torch.nn.CrossEntropyLoss. Again, note that this only applies directly to the last layer of your network, any other computation is not affected by any of this.

answered Oct 02 '22 05:10

dennlinger

Related questions
                            
                                Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector
                            
                                Does the TensorFlow backend of Keras rely on the eager execution?
                            
                                Storing multiple dataframes of different widths with Parquet?
                            
                                Jupyter commands work only with a dash (e.g. jupyter-kernelspec instead of jupyter kernelspec)
                            
                                Groupby search first and last True values
                            
                                TensorFlow tf.data.Dataset and bucketing
                            
                                requirements.txt - How to mark alternative packages
                            
                                Python Click: Multiple Key Value Pair Arguments
                            
                                Running/Debugging Pycharm Python Scripts with remote Docker Machine
                            
                                How to do a polynomial fit with fixed points in 3D
                            
                                Jinja2 check if value exists in list of dictionaries
                            
                                How to solve "Error connecting to SMTP host: [Errno 10061] No connection could be made because the target machine actively refused it''?
                            
                                Implementing an “infinite loop” Dataset & DataLoader in PyTorch
                            
                                How to get functools.lru_cache to return new instances?
                            
                                Start async task now, await later
                            
                                Finding minimal jump zero crossings in numpy
                            
                                Python Multiprocessing Queue when set to infinite is capped at 32768 (2^15)
                            
                                Why does python require you to acquire a lock before waiting on a condition
                            
                                TypeError: write() argument must be str, not byte , upgrade to python 3 [duplicate]
                            
                                Unable to close worksheet in xlsxwriter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does pytorch apply softmax automatically in nn.Linear

Tags:

python

deep-learning

pytorch

activation-function

yujuezhao

People also ask

1 Answers

dennlinger

Recent Activity

Donate For Us