Understanding a simple LSTM pytorch

Tags:

import torch,ipdb import torch.autograd as autograd import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.autograd import Variable  rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2) input = Variable(torch.randn(5, 3, 10)) h0 = Variable(torch.randn(2, 3, 20)) c0 = Variable(torch.randn(2, 3, 20)) output, hn = rnn(input, (h0, c0))

This is the LSTM example from the docs. I don't know understand the following things:

What is output-size and why is it not specified anywhere?
Why does the input have 3 dimensions. What does 5 and 3 represent?
What are 2 and 3 in h0 and c0, what do those represent?

Edit:

import torch,ipdb import torch.autograd as autograd import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.autograd import Variable import torch.nn.functional as F  num_layers=3 num_hyperparams=4 batch = 1 hidden_size = 20 rnn = nn.LSTM(input_size=num_hyperparams, hidden_size=hidden_size, num_layers=num_layers)  input = Variable(torch.randn(1, batch, num_hyperparams)) # (seq_len, batch, input_size) h0 = Variable(torch.randn(num_layers, batch, hidden_size)) # (num_layers, batch, hidden_size) c0 = Variable(torch.randn(num_layers, batch, hidden_size)) output, hn = rnn(input, (h0, c0)) affine1 = nn.Linear(hidden_size, num_hyperparams)  ipdb.set_trace() print output.size() print h0.size()

*** RuntimeError: matrices expected, got 3D, 2D tensors at

320

asked Jul 10 '17 22:07

Abhishek Bhatia

2 Answers

The output for the LSTM is the output for all the hidden nodes on the final layer.
hidden_size - the number of LSTM blocks per layer.
input_size - the number of input features per time-step.
num_layers - the number of hidden layers.
In total there are hidden_size * num_layers LSTM blocks.

The input dimensions are (seq_len, batch, input_size).
seq_len - the number of time steps in each input stream.
batch - the size of each batch of input sequences.

The hidden and cell dimensions are: (num_layers, batch, hidden_size)

output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t.

So there will be hidden_size * num_directions outputs. You didn't initialise the RNN to be bidirectional so num_directions is 1. So output_size = hidden_size.

Edit: You can change the number of outputs by using a linear layer:

out_rnn, hn = rnn(input, (h0, c0)) lin = nn.Linear(hidden_size, output_size) v1 = nn.View(seq_len*batch, hidden_size) v2 = nn.View(seq_len, batch, output_size) output = v2(lin(v1(out_rnn)))

Note: for this answer I assumed that we're only talking about non-bidirectional LSTMs.

Source: PyTorch docs.

130

answered Oct 05 '22 02:10

cdo256

Answer by cdo256 is almost correct. He is mistaken when referring to what hidden_size means. He explains it as:

hidden_size - the number of LSTM blocks per layer.

but really, here is a better explanation:

Each sigmoid, tanh or hidden state layer in the cell is actually a set of nodes, whose number is equal to the hidden layer size. Therefore each of the “nodes” in the LSTM cell is actually a cluster of normal neural network nodes, as in each layer of a densely connected neural network. Hence, if you set hidden_size = 10, then each one of your LSTM blocks, or cells, will have neural networks with 10 nodes in them. The total number of LSTM blocks in your LSTM model will be equivalent to that of your sequence length.

This can be seen by analyzing the differences in examples between nn.LSTM and nn.LSTMCell:

https://pytorch.org/docs/stable/nn.html#torch.nn.LSTM

and

https://pytorch.org/docs/stable/nn.html#torch.nn.LSTMCell

answered Oct 05 '22 01:10

Lsehovac

Related questions
                            
                                SKlearn import MLPClassifier fails
                            
                                Neural Network training with PyBrain won't converge
                            
                                Activation function after pooling layer or convolutional layer?
                            
                                Convolutional Neural Network (CNN) for Audio [closed]
                            
                                Designing Neural Networks
                            
                                How to choose number of hidden layers and nodes in neural network? [closed]
                            
                                What are forward and backward passes in neural networks?
                            
                                Keras flowFromDirectory get file names as they are being generated
                            
                                How to engineer features for machine learning [closed]
                            
                                Google Colab is very slow compared to my PC
                            
                                Neural Networks [closed]
                            
                                How to use advanced activation layers in Keras?
                            
                                When should I use support vector machines as opposed to artificial neural networks?
                            
                                Difference between np.dot and np.multiply with np.sum in binary cross-entropy loss calculation
                            
                                Prerequisites Needed to Read Books on Neural Networks (and understand them)
                            
                                Multivariate LSTM with missing values
                            
                                Altering trained images to train neural network
                            
                                How to work with multiple inputs for LSTM in Keras?
                            
                                How to make virtual organisms learn using neural networks? [closed]
                            
                                How to calculate optimal batch size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding a simple LSTM pytorch

Tags:

neural-network

lstm

pytorch

rnn

Abhishek Bhatia

People also ask

2 Answers

cdo256

Lsehovac

Recent Activity

Donate For Us