This is the API I am looking at, https://pytorch.org/docs/stable/nn.html#gru
It outputs:
output
of shape (seq_len, batch, num_directions * hidden_size)h_n
of shape (num_layers * num_directions, batch, hidden_size)For GRU with more than one layers, I wonder how to fetch the hidden state of the last layer, should it be h_n[0]
or h_n[-1]
?
What if it's bidirectional, how to do the slicing to obtain the last hidden layer states of GRUs in both directions?
The documentation nn.GRU is clear about this. Here is an example to make it more explicit:
For the unidirectional GRU/LSTM (with more than one hidden layer):
output
- would contain all the output features of all the timesteps t
h_n
- would return the hidden state (at last timestep) of all layers.
To get the hidden state of the last hidden layer and last timestep, use:
first_hidden_layer_last_timestep = h_n[0]
last_hidden_layer_last_timestep = h_n[-1]
where n
is the sequence length.
This is because description says:
num_layers – Number of recurrent layers. E.g., setting num_layers=2
would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results.
So, it is natural and intuitive to also return the results (i.e. hidden states) accordingly in the same order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With