The docs say
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
Now, the batch and hidden_size dimensions are pretty much self-explanatory. The first dimension remains a mystery, though.
I assume, that the hidden states of all "last cells" of all layers are included in this output. But then what is the index of, for example, the hidden state of the "last cell" in the "uppermost layer"? h_n[-1]
? h_n[0]
?
Is the output affected by the batch_first
option?
The output of the Pytorch LSTM layer is a tuple with two elements.
Here the hidden_size of the LSTM layer would be 512 as there are 512 units in each LSTM cell and the num_layers would be 2. The num_layers is the number of layers stacked on top of each other.
Hidden size is number of features of the hidden state for RNN. So if you increase hidden size then you compute bigger feature as hidden state output.
The implementation of LSTM and GRU in pytorch automatically includes the possibility of stacked layers of LSTMs and GRUs.
You give this with the keyword argument nn.LSTM(num_layers=num_layers)
. num_layers
is the number of stacked LSTMs (or GRUs) that you have. The default value is 1, which gives you the basic LSTM.
num_directions
is either 1 or 2. It is 1 for normal LSTMs and GRUs, and it is 2 for bidirectional RNNs.
So in your case, you probably have a simple LSTM or GRU so the value of num_layers * num_directions
would then be one.
h_n[0]
is the hidden state of the bottom-most layer (the one which takes in the input), and h_n[-1]
of the top-most layer (the one which outputs the output of the network).
batch_first
puts the batch dimension before the time dimension (the default being the time dimension before the batch dimension), because the hidden state doesn't have a time dimension, batch_first
has no effect on the hidden state shape.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With