import torch,ipdb import torch.autograd as autograd import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.autograd import Variable rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2) input = Variable(torch.randn(5, 3, 10)) h0 = Variable(torch.randn(2, 3, 20)) c0 = Variable(torch.randn(2, 3, 20)) output, hn = rnn(input, (h0, c0))
This is the LSTM example from the docs. I don't know understand the following things:
Edit:
import torch,ipdb import torch.autograd as autograd import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.autograd import Variable import torch.nn.functional as F num_layers=3 num_hyperparams=4 batch = 1 hidden_size = 20 rnn = nn.LSTM(input_size=num_hyperparams, hidden_size=hidden_size, num_layers=num_layers) input = Variable(torch.randn(1, batch, num_hyperparams)) # (seq_len, batch, input_size) h0 = Variable(torch.randn(num_layers, batch, hidden_size)) # (num_layers, batch, hidden_size) c0 = Variable(torch.randn(num_layers, batch, hidden_size)) output, hn = rnn(input, (h0, c0)) affine1 = nn.Linear(hidden_size, num_hyperparams) ipdb.set_trace() print output.size() print h0.size()
*** RuntimeError: matrices expected, got 3D, 2D tensors at
LSTM ExplainedIt is a variety of recurrent neural networks (RNNs) that are capable of learning long-term dependencies, especially in sequence prediction problems. LSTM has feedback connections, i.e., it is capable of processing the entire sequence of data, apart from single data points such as images.
The weight matrix W contains different weights for the current input vector and the previous hidden state for each gate. Just like Recurrent Neural Networks, an LSTM network also generates an output at each time step and this output is used to train the network using gradient descent.
The output of the Pytorch LSTM layer is a tuple with two elements. The first element of the tuple is LSTM's output corresponding to all timesteps ( hᵗ : ∀t = 1,2… T ) with shape (timesteps, batch, output_features) . The second element of the tuple is another tuple with two elements.
The output for the LSTM is the output for all the hidden nodes on the final layer.hidden_size
- the number of LSTM blocks per layer.input_size
- the number of input features per time-step.num_layers
- the number of hidden layers.
In total there are hidden_size * num_layers
LSTM blocks.
The input dimensions are (seq_len, batch, input_size)
.seq_len
- the number of time steps in each input stream.batch
- the size of each batch of input sequences.
The hidden and cell dimensions are: (num_layers, batch, hidden_size)
output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t.
So there will be hidden_size * num_directions
outputs. You didn't initialise the RNN to be bidirectional so num_directions
is 1. So output_size = hidden_size
.
Edit: You can change the number of outputs by using a linear layer:
out_rnn, hn = rnn(input, (h0, c0)) lin = nn.Linear(hidden_size, output_size) v1 = nn.View(seq_len*batch, hidden_size) v2 = nn.View(seq_len, batch, output_size) output = v2(lin(v1(out_rnn)))
Note: for this answer I assumed that we're only talking about non-bidirectional LSTMs.
Source: PyTorch docs.
Answer by cdo256 is almost correct. He is mistaken when referring to what hidden_size means. He explains it as:
hidden_size - the number of LSTM blocks per layer.
but really, here is a better explanation:
Each sigmoid, tanh or hidden state layer in the cell is actually a set of nodes, whose number is equal to the hidden layer size. Therefore each of the “nodes” in the LSTM cell is actually a cluster of normal neural network nodes, as in each layer of a densely connected neural network. Hence, if you set hidden_size = 10, then each one of your LSTM blocks, or cells, will have neural networks with 10 nodes in them. The total number of LSTM blocks in your LSTM model will be equivalent to that of your sequence length.
This can be seen by analyzing the differences in examples between nn.LSTM and nn.LSTMCell:
https://pytorch.org/docs/stable/nn.html#torch.nn.LSTM
and
https://pytorch.org/docs/stable/nn.html#torch.nn.LSTMCell
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With