Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the output h_n of an RNN (nn.LSTM, nn.GRU, etc.) in PyTorch structured?

The docs say

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

Now, the batch and hidden_size dimensions are pretty much self-explanatory. The first dimension remains a mystery, though.

I assume, that the hidden states of all "last cells" of all layers are included in this output. But then what is the index of, for example, the hidden state of the "last cell" in the "uppermost layer"? h_n[-1]? h_n[0]?

Is the output affected by the batch_first option?

like image 967
the-bass Avatar asked Apr 05 '18 13:04

the-bass


People also ask

What is the output of LSTM in Pytorch?

The output of the Pytorch LSTM layer is a tuple with two elements.

What is hidden size in Lstm Pytorch?

Here the hidden_size of the LSTM layer would be 512 as there are 512 units in each LSTM cell and the num_layers would be 2. The num_layers is the number of layers stacked on top of each other.

What is hidden size in Lstm?

Hidden size is number of features of the hidden state for RNN. So if you increase hidden size then you compute bigger feature as hidden state output.


Video Answer


1 Answers

The implementation of LSTM and GRU in pytorch automatically includes the possibility of stacked layers of LSTMs and GRUs.

You give this with the keyword argument nn.LSTM(num_layers=num_layers). num_layers is the number of stacked LSTMs (or GRUs) that you have. The default value is 1, which gives you the basic LSTM.

num_directions is either 1 or 2. It is 1 for normal LSTMs and GRUs, and it is 2 for bidirectional RNNs.

So in your case, you probably have a simple LSTM or GRU so the value of num_layers * num_directions would then be one.

h_n[0] is the hidden state of the bottom-most layer (the one which takes in the input), and h_n[-1] of the top-most layer (the one which outputs the output of the network).

batch_first puts the batch dimension before the time dimension (the default being the time dimension before the batch dimension), because the hidden state doesn't have a time dimension, batch_first has no effect on the hidden state shape.

like image 125
patapouf_ai Avatar answered Oct 20 '22 01:10

patapouf_ai