Difference between 1 LSTM with num_layers = 2 and 2 LSTMs in pytorch

Tags:

I am new to deep learning and currently working on using LSTMs for language modeling. I was looking at the pytorch documentation and was confused by it.

If I create a

nn.LSTM(input_size, hidden_size, num_layers)

where hidden_size = 4 and num_layers = 2, I think I will have an architecture something like:

op0    op1 ....
LSTM -> LSTM -> h3
LSTM -> LSTM -> h2
LSTM -> LSTM -> h1
LSTM -> LSTM -> h0
x0     x1 .....

If I do something like

nn.LSTM(input_size, hidden_size, 1)
nn.LSTM(input_size, hidden_size, 1)

I think the network architecture will look exactly like above. Am I wrong? And if yes, what is the difference between these two?

588

asked Mar 11 '18 19:03

user3828311

1 Answers

The multi-layer LSTM is better known as stacked LSTM where multiple layers of LSTM are stacked on top of each other.

Your understanding is correct. The following two definitions of stacked LSTM are same.

nn.LSTM(input_size, hidden_size, 2)

and

nn.Sequential(OrderedDict([
    ('LSTM1', nn.LSTM(input_size, hidden_size, 1),
    ('LSTM2', nn.LSTM(hidden_size, hidden_size, 1)
]))

Here, the input is feed into the lowest layer of LSTM and then the output of the lowest layer is forwarded to the next layer and so on so forth. Please note, the output size of the lowest LSTM layer and the rest of the LSTM layer's input size is hidden_size.

However, you may have seen people defined stacked LSTM in the following way:

rnns = nn.ModuleList()
for i in range(nlayers):
    input_size = input_size if i == 0 else hidden_size
    rnns.append(nn.LSTM(input_size, hidden_size, 1))

The reason people sometimes use the above approach is that if you create a stacked LSTM using the first two approaches, you can't get the hidden states of each individual layer. Check out what LSTM returns in PyTorch.

So, if you want to have the intermedia layer's hidden states, you have to declare each individual LSTM layer as a single LSTM and run through a loop to mimic the multi-layer LSTM operations. For example:

outputs = []
for i in range(nlayers):
    if i != 0:
        sent_variable = F.dropout(sent_variable, p=0.2, training=True)
    output, hidden = rnns[i](sent_variable)
    outputs.append(output)
    sent_variable = output

In the end, outputs will contain all the hidden states of each individual LSTM layer.

194

answered Sep 30 '22 06:09

Wasi Ahmad

Related questions
                            
                                How does the unpooling and deconvolution work in DeConvNet
                            
                                Keras LSTM input dimension setting
                            
                                How does TensorFlow SparseCategoricalCrossentropy work?
                            
                                Is it possible to automatically infer the class_weight from flow_from_directory in Keras?
                            
                                In Tensorflow, how to use tf.gather() for the last dimension?
                            
                                How to connect LSTM layers in Keras, RepeatVector or return_sequence=True?
                            
                                PyBrain - how to do Deep belief network training?
                            
                                Load saved checkpoint and predict not producing same results as in training
                            
                                In-place operations with PyTorch
                            
                                Multiple sessions and graphs in Tensorflow (in the same process)
                            
                                Using Deep Learning to Predict Subsequence from Sequence
                            
                                ValueError: Input 0 is incompatible with layer conv1d_1: expected ndim=3, found ndim=4
                            
                                Reproducibility and performance in PyTorch
                            
                                Building custom Caffe layer in python
                            
                                How to evolve weights of a neural network in Neuroevolution?
                            
                                Tensorflow Sequence to sequence model using the seq2seq API ( ver 1.1 and above)
                            
                                Inputs to eager execution function cannot be Keras symbolic tensors
                            
                                Save model every 10 epochs tensorflow.keras v2
                            
                                What is "unk" in the pretrained GloVe vector files (e.g. glove.6B.50d.txt)?
                            
                                Deep learning for image classification [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between 1 LSTM with num_layers = 2 and 2 LSTMs in pytorch

Tags:

deep-learning

lstm

pytorch

recurrent-neural-network

user3828311

People also ask

1 Answers

Wasi Ahmad

Recent Activity

Donate For Us