Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pytorch : unable to understand model.forward function

I am learning deep learning and am trying to understand the pytorch code given below. I'm struggling to understand how the probability calculation works. Can somehow break it down in lay-man terms. Thanks a ton.

ps = model.forward(images[0,:])

# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[1], output_size),
                      nn.Softmax(dim=1))
print(model)

# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)
ps = model.forward(images[0,:])
like image 216
imbecile_nl Avatar asked Mar 05 '23 17:03

imbecile_nl


1 Answers

I'm a layman so I'll help you with the layman's terms :)

input_size = 784
hidden_sizes = [128, 64]
output_size = 10

These are parameters for the layers in your network. Each neural network consists of layers, and each layer has an input and an output shape.

Specifically input_size deals with the input shape of the first layer. This is the input_size of the entire network. Each sample that is input into the network will be a 1 dimension vector that is length 784 (array that is 784 long).

hidden_size deals with the shapes inside the network. We will cover this a little later.

output_size deals with the output shape of the last layer. This means that our network will output a 1 dimensional vector that is length 10 for each sample.

Now to break up model definition line by line:

model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),

The nn.Sequential part simply defines a network, each argument that is input defines a new layer in that network in that order.

nn.Linear(input_size, hidden_sizes[0]) is an example of such a layer. It is the first layer of our network takes in an input of size input_size, and outputs a vector of size hidden_sizes[0]. The size of the output is considered "hidden" in that it is not the input or the output of the whole network. It "hidden" because it's located inside of the network far from the input and output ends of the network that you interact with when you actually use it.

This is called Linear because it applies a linear transformation by multiplying the input by its weights matrix and adding its bias matrix to the result. (Y = Ax + b, Y = output, x = input, A = weights, b = bias).

nn.ReLU(),

ReLU is an example of an activation function. What this function does is apply some sort of transformation to the output of the last layer (the layer discussed above), and outputs the result of that transformation. In this case the function being used is the ReLU function, which is defined as ReLU(x) = max(x, 0). Activation functions are used in neural networks because they create non-linearities. This allows your model to model non-linear relationships.

nn.Linear(hidden_sizes[0], hidden_sizes[1]),

From what we discussed above, this is a another example of a layer. It takes an input of hidden_sizes[0] (same shape as the output of the last layer) and outputs a 1D vector of length hidden_sizes[1].

nn.ReLU(),

Apples the ReLU function again.

nn.Linear(hidden_sizes[1], output_size)

Same as the above two layers, but our output shape is the output_size this time.

nn.Softmax(dim=1))

Another activation function. This activation function turns the logits outputted by nn.Linear into an actual probability distribution. This lets the model output the probability for each class. At this point our model is built.

# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)

These are simply just preprocessing training data and putting it into the correct format

ps = model.forward(images[0,:])

This passes the images through the model (forward pass) and applies the operations previously discussed in layer. You get the resultant output.

like image 187
Primusa Avatar answered Mar 10 '23 15:03

Primusa