Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the class definition of nn.Linear in PyTorch?

Tags:

I have the following code for PyTorch:

import torch.nn as nn import torch.nn.functional as F  class Network(nn.Module):     def __init__(self):         super().__init__()         self.hidden = nn.Linear(784, 256)         self.output = nn.Linear(256, 10)          def forward(self, x):         x = F.sigmoid(self.hidden(x))         x = F.softmax(self.output(x), dim=1)              return x 

My question: What is this self.hidden?

It returns from nn.Linear and it can take x as argument. What exactly is the purpose of self.hidden?

like image 342
jason Avatar asked Feb 27 '19 23:02

jason


People also ask

What is nn linear in PyTorch?

PyTorch - nn.Linear nn. Linear(n,m) is a module that creates single layer feed forward network with n inputs and m output. Mathematically, this module is designed to calculate the linear equation Ax = b where x is input, b is output, A is weight. This is where the name 'Linear' came from.

Is nn linear same as dense?

Yes, it is the same. model. add (Dense(10, activation = None)) or nn. linear(128, 10) is the same, because it is not activated in both, therefore if you don't specify anything, no activation is applied.

Does nn linear include activation?

Pytorch nn. linear activation function is defined as the process which takes the input and output attributes and prepares the matrics. nn. ReLU is used as an activation function that creates the network and also fits the complex data.

What is nn module in PyTorch?

nn contains different classess that help you build neural network models. All models in PyTorch inherit from the subclass nn. Module , which has useful methods like parameters() , __call__() and others. This module torch. nn also has various layers that you can use to build your neural network.


2 Answers

What is the class definition of nn.Linear in pytorch?

From documentation:


CLASS torch.nn.Linear(in_features, out_features, bias=True)

Applies a linear transformation to the incoming data: y = x*W^T + b

Parameters:

  • in_features – size of each input sample (i.e. size of x)
  • out_features – size of each output sample (i.e. size of y)
  • bias – If set to False, the layer will not learn an additive bias. Default: True

Note that the weights W have shape (out_features, in_features) and biases b have shape (out_features). They are initialized randomly and can be changed later (e.g. during the training of a Neural Network they are updated by some optimization algorithm).

In your Neural Network, the self.hidden = nn.Linear(784, 256) defines a hidden (meaning that it is in between of the input and output layers), fully connected linear layer, which takes input x of shape (batch_size, 784), where batch size is the number of inputs (each of size 784) which are passed to the network at once (as a single tensor), and transforms it by the linear equation y = x*W^T + b into a tensor y of shape (batch_size, 256). It is further transformed by the sigmoid function, x = F.sigmoid(self.hidden(x)) (which is not a part of the nn.Linear but an additional step).

Let's see a concrete example:

import torch import torch.nn as nn  x = torch.tensor([[1.0, -1.0],                   [0.0,  1.0],                   [0.0,  0.0]])  in_features = x.shape[1]  # = 2 out_features = 2  m = nn.Linear(in_features, out_features) 

where x contains three inputs (i.e. the batch size is 3), x[0], x[1] and x[3], each of size 2, and the output is going to be of shape (batch size, out_features) = (3, 2).

The values of the parameters (weights and biases) are:

>>> m.weight tensor([[-0.4500,  0.5856],         [-0.1807, -0.4963]])  >>> m.bias tensor([ 0.2223, -0.6114]) 

(because they were initialized randomly, most likely you will get different values from the above)

The output is:

>>> y = m(x) tensor([[-0.8133, -0.2959],         [ 0.8079, -1.1077],         [ 0.2223, -0.6114]]) 

and (behind the scenes) it is computed as:

y = x.matmul(m.weight.t()) + m.bias  # y = x*W^T + b 

i.e.

y[i,j] == x[i,0] * m.weight[j,0] + x[i,1] * m.weight[j,1] + m.bias[j] 

where i is in interval [0, batch_size) and j in [0, out_features).

like image 67
Andreas K. Avatar answered Oct 06 '22 19:10

Andreas K.


The Network defined as having two layers, hidden and output. Roughly speaking, the function of the hidden layer is to hold parameters you can optimize during training.

like image 36
Sergii Dymchenko Avatar answered Oct 06 '22 19:10

Sergii Dymchenko