I have the following code for PyTorch:
import torch.nn as nn import torch.nn.functional as F class Network(nn.Module): def __init__(self): super().__init__() self.hidden = nn.Linear(784, 256) self.output = nn.Linear(256, 10) def forward(self, x): x = F.sigmoid(self.hidden(x)) x = F.softmax(self.output(x), dim=1) return x
My question: What is this self.hidden
?
It returns from nn.Linear
and it can take x
as argument. What exactly is the purpose of self.hidden
?
PyTorch - nn.Linear nn. Linear(n,m) is a module that creates single layer feed forward network with n inputs and m output. Mathematically, this module is designed to calculate the linear equation Ax = b where x is input, b is output, A is weight. This is where the name 'Linear' came from.
Yes, it is the same. model. add (Dense(10, activation = None)) or nn. linear(128, 10) is the same, because it is not activated in both, therefore if you don't specify anything, no activation is applied.
Pytorch nn. linear activation function is defined as the process which takes the input and output attributes and prepares the matrics. nn. ReLU is used as an activation function that creates the network and also fits the complex data.
nn contains different classess that help you build neural network models. All models in PyTorch inherit from the subclass nn. Module , which has useful methods like parameters() , __call__() and others. This module torch. nn also has various layers that you can use to build your neural network.
What is the class definition of nn.Linear in pytorch?
From documentation:
CLASS torch.nn.Linear(in_features, out_features, bias=True)
Applies a linear transformation to the incoming data: y = x*W^T + b
Parameters:
Note that the weights W
have shape (out_features, in_features)
and biases b
have shape (out_features)
. They are initialized randomly and can be changed later (e.g. during the training of a Neural Network they are updated by some optimization algorithm).
In your Neural Network, the self.hidden = nn.Linear(784, 256)
defines a hidden (meaning that it is in between of the input and output layers), fully connected linear layer, which takes input x
of shape (batch_size, 784)
, where batch size is the number of inputs (each of size 784) which are passed to the network at once (as a single tensor), and transforms it by the linear equation y = x*W^T + b
into a tensor y
of shape (batch_size, 256)
. It is further transformed by the sigmoid function, x = F.sigmoid(self.hidden(x))
(which is not a part of the nn.Linear
but an additional step).
Let's see a concrete example:
import torch import torch.nn as nn x = torch.tensor([[1.0, -1.0], [0.0, 1.0], [0.0, 0.0]]) in_features = x.shape[1] # = 2 out_features = 2 m = nn.Linear(in_features, out_features)
where x
contains three inputs (i.e. the batch size is 3), x[0]
, x[1]
and x[3]
, each of size 2, and the output is going to be of shape (batch size, out_features) = (3, 2)
.
The values of the parameters (weights and biases) are:
>>> m.weight tensor([[-0.4500, 0.5856], [-0.1807, -0.4963]]) >>> m.bias tensor([ 0.2223, -0.6114])
(because they were initialized randomly, most likely you will get different values from the above)
The output is:
>>> y = m(x) tensor([[-0.8133, -0.2959], [ 0.8079, -1.1077], [ 0.2223, -0.6114]])
and (behind the scenes) it is computed as:
y = x.matmul(m.weight.t()) + m.bias # y = x*W^T + b
i.e.
y[i,j] == x[i,0] * m.weight[j,0] + x[i,1] * m.weight[j,1] + m.bias[j]
where i
is in interval [0, batch_size)
and j
in [0, out_features)
.
The Network
defined as having two layers, hidden and output. Roughly speaking, the function of the hidden layer is to hold parameters you can optimize during training.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With