I'm having trouble understanding the documentation for PyTorch's LSTM module (and also RNN and GRU, which are similar). Regarding the outputs, it says: <blockquote> Outputs: output, (h_n, c_n) <ul> <li>output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.</li> <li>h_n (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len</li> <li>c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t=seq_len</li> </ul> </blockquote> It seems that the variables <code>output</code> and <code>h_n</code> both give the values of the hidden state. Does <code>h_n</code> just redundantly provide the last time step that's already included in <code>output</code>, or is there something more to it than that?

I made a diagram. The names follow the PyTorch docs, although I renamed <code>num_layers</code> to <code>w</code>. <code>output</code> comprises all the hidden states in the last layer ("last" depth-wise, not time-wise). <code>(h_n, c_n)</code> comprises the hidden states after the last timestep, t = n, so you could potentially feed them into another LSTM. <img src="https://i.stack.imgur.com/SjnTl.png" alt="LSTM diagram"> The batch dimension is not included.

What's the difference between "hidden" and "output" in PyTorch LSTM?

Tags:

deep-learning

lstm

tensor

pytorch

recurrent-neural-network

I'm having trouble understanding the documentation for PyTorch's LSTM module (and also RNN and GRU, which are similar). Regarding the outputs, it says:

Outputs: output, (h_n, c_n)

output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.

h_n (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len

c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t=seq_len

It seems that the variables output and h_n both give the values of the hidden state. Does h_n just redundantly provide the last time step that's already included in output, or is there something more to it than that?

268

asked Jan 17 '18 13:01

N. Virgo

1 Answers

I made a diagram. The names follow the PyTorch docs, although I renamed num_layers to w.

output comprises all the hidden states in the last layer ("last" depth-wise, not time-wise). (h_n, c_n) comprises the hidden states after the last timestep, t = n, so you could potentially feed them into another LSTM.

LSTM diagram

The batch dimension is not included.

answered Sep 23 '22 14:09

nnnmmm

Related questions
                            
                                Tensorflow: None of the MLIR optimization passes are enabled (registered 1)
                            
                                Unbalanced data and weighted cross entropy
                            
                                Keras - Difference between categorical_accuracy and sparse_categorical_accuracy
                            
                                PyTorch: How to change the learning rate of an optimizer at any given moment (no LR schedule)
                            
                                OpenCL / AMD: Deep Learning [closed]
                            
                                What is the difference between loss function and metric in Keras? [duplicate]
                            
                                How does keras handle multiple losses?
                            
                                How to import keras from tf.keras in Tensorflow?
                            
                                Dimension of shape in conv1D
                            
                                Neural network always predicts the same class
                            
                                Keras model.summary() object to string
                            
                                TensorFlow - regularization with L2 loss, how to apply to all weights, not just last one?
                            
                                Does Any one got "AttributeError: 'str' object has no attribute 'decode' " , while Loading a Keras Saved Model
                            
                                Error when checking model input: expected convolution2d_input_1 to have 4 dimensions, but got array with shape (32, 32, 3)
                            
                                How to calculate the number of parameters for convolutional neural network?
                            
                                Gradient Descent vs Adagrad vs Momentum in TensorFlow
                            
                                How do I split a custom dataset into training and test datasets?
                            
                                Estimating the number of neurons and number of layers of an artificial neural network [closed]
                            
                                Batch Normalization in Convolutional Neural Network
                            
                                What's the difference between torch.stack() and torch.cat() functions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With