I am learning deep learning and am trying to understand the pytorch code given below. I'm struggling to understand how the probability calculation works. Can somehow break it down in lay-man terms. Thanks a ton.
ps = model.forward(images[0,:])
# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.Softmax(dim=1))
print(model)
# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)
ps = model.forward(images[0,:])
I'm a layman so I'll help you with the layman's terms :)
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
These are parameters for the layers in your network. Each neural network consists of layers
, and each layer
has an input and an output shape.
Specifically input_size
deals with the input shape of the first layer. This is the input_size
of the entire network. Each sample that is input into the network will be a 1 dimension vector that is length 784 (array that is 784 long).
hidden_size
deals with the shapes inside the network. We will cover this a little later.
output_size
deals with the output shape of the last layer. This means that our network will output a 1 dimensional vector that is length 10 for each sample.
Now to break up model definition line by line:
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
The nn.Sequential
part simply defines a network, each argument that is input defines a new layer in that network in that order.
nn.Linear(input_size, hidden_sizes[0])
is an example of such a layer. It is the first layer of our network takes in an input of size input_size
, and outputs a vector of size hidden_sizes[0]
. The size of the output is considered "hidden" in that it is not the input or the output of the whole network. It "hidden" because it's located inside of the network far from the input and output ends of the network that you interact with when you actually use it.
This is called Linear
because it applies a linear transformation by multiplying the input by its weights matrix and adding its bias matrix to the result. (Y = Ax + b, Y = output, x = input, A = weights, b = bias).
nn.ReLU(),
ReLU is an example of an activation function. What this function does is apply some sort of transformation to the output of the last layer (the layer discussed above), and outputs the result of that transformation. In this case the function being used is the ReLU
function, which is defined as ReLU(x) = max(x, 0)
. Activation functions are used in neural networks because they create non-linearities. This allows your model to model non-linear relationships.
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
From what we discussed above, this is a another example of a layer
. It takes an input of hidden_sizes[0]
(same shape as the output of the last layer) and outputs a 1D vector of length hidden_sizes[1]
.
nn.ReLU(),
Apples the ReLU
function again.
nn.Linear(hidden_sizes[1], output_size)
Same as the above two layers, but our output shape is the output_size
this time.
nn.Softmax(dim=1))
Another activation function. This activation function turns the logits outputted by nn.Linear
into an actual probability distribution. This lets the model output the probability for each class. At this point our model is built.
# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)
These are simply just preprocessing training data and putting it into the correct format
ps = model.forward(images[0,:])
This passes the images through the model (forward pass) and applies the operations previously discussed in layer. You get the resultant output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With