The docs say <blockquote> h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len </blockquote> Now, the batch and hidden_size dimensions are pretty much self-explanatory. The first dimension remains a mystery, though. I assume, that the hidden states of all "last cells" of all layers are included in this output. But then what is the index of, for example, the hidden state of the "last cell" in the "uppermost layer"? <code>h_n[-1]</code>? <code>h_n[0]</code>? Is the output affected by the <code>batch_first</code> option?

The implementation of LSTM and GRU in pytorch automatically includes the possibility of stacked layers of LSTMs and GRUs. You give this with the keyword argument <code>nn.LSTM(num_layers=num_layers)</code>. <code>num_layers</code> is the number of stacked LSTMs (or GRUs) that you have. The default value is 1, which gives you the basic LSTM. <code>num_directions</code> is either 1 or 2. It is 1 for normal LSTMs and GRUs, and it is 2 for bidirectional RNNs. So in your case, you probably have a simple LSTM or GRU so the value of <code>num_layers * num_directions</code> would then be one. <code>h_n[0]</code> is the hidden state of the bottom-most layer (the one which takes in the input), and <code>h_n[-1]</code> of the top-most layer (the one which outputs the output of the network). <code>batch_first</code> puts the batch dimension before the time dimension (the default being the time dimension before the batch dimension), because the hidden state doesn't have a time dimension, <code>batch_first</code> has no effect on the hidden state shape.

How is the output h_n of an RNN (nn.LSTM, nn.GRU, etc.) in PyTorch structured?

Tags:

python

neural-network

deep-learning

lstm

pytorch

The docs say

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

Now, the batch and hidden_size dimensions are pretty much self-explanatory. The first dimension remains a mystery, though.

I assume, that the hidden states of all "last cells" of all layers are included in this output. But then what is the index of, for example, the hidden state of the "last cell" in the "uppermost layer"? h_n[-1]? h_n[0]?

Is the output affected by the batch_first option?

967

asked Apr 05 '18 13:04

the-bass

Video Answer

1 Answers

The implementation of LSTM and GRU in pytorch automatically includes the possibility of stacked layers of LSTMs and GRUs.

You give this with the keyword argument nn.LSTM(num_layers=num_layers). num_layers is the number of stacked LSTMs (or GRUs) that you have. The default value is 1, which gives you the basic LSTM.

num_directions is either 1 or 2. It is 1 for normal LSTMs and GRUs, and it is 2 for bidirectional RNNs.

So in your case, you probably have a simple LSTM or GRU so the value of num_layers * num_directions would then be one.

h_n[0] is the hidden state of the bottom-most layer (the one which takes in the input), and h_n[-1] of the top-most layer (the one which outputs the output of the network).

batch_first puts the batch dimension before the time dimension (the default being the time dimension before the batch dimension), because the hidden state doesn't have a time dimension, batch_first has no effect on the hidden state shape.

125

answered Oct 20 '22 01:10

patapouf_ai

Related questions
                            
                                Issue with Matplotlib scatterplot and Color maps
                            
                                Interactive brokers: How to retrieve transaction history records?
                            
                                Why are Python Lists called 'lists' when they are implemented as dynamic arrays
                            
                                Is it possible to construct a dictionary comprehension from a list of unparsed strings without double split? [duplicate]
                            
                                Python - Checking if file is created today
                            
                                Python - How to pass a method as an argument to call a method from another library
                            
                                Serverless AWS (Python) read from S3 : Access Denied
                            
                                Abstract base class model vs Proxy model in Django
                            
                                Module 'pandas' has no attribute 'DataFrame'
                            
                                Assign group averages to each row in python/pandas
                            
                                Combination of lists from two lists of strings
                            
                                Is it possible to delete/downgrade python packages from Google Colab?
                            
                                export Keras model to .pb file and optimize for inference gives random guess on Android
                            
                                ftplib MLSD command gives 500 Unknown command
                            
                                How can I replace values in an xarray variable?
                            
                                Assign schema to pa.Table.from_pandas()
                            
                                How to connect to a remote Jupyter notebook server? [closed]
                            
                                Find String Between Two Substrings in Python When There is A Space After First Substring
                            
                                Find longest quasi-constant sub-sequence of a sequence
                            
                                What is greenlet?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With