Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between 1 LSTM with num_layers = 2 and 2 LSTMs in pytorch

I am new to deep learning and currently working on using LSTMs for language modeling. I was looking at the pytorch documentation and was confused by it.

If I create a

nn.LSTM(input_size, hidden_size, num_layers) 

where hidden_size = 4 and num_layers = 2, I think I will have an architecture something like:

op0    op1 ....
LSTM -> LSTM -> h3
LSTM -> LSTM -> h2
LSTM -> LSTM -> h1
LSTM -> LSTM -> h0
x0     x1 .....

If I do something like

nn.LSTM(input_size, hidden_size, 1)
nn.LSTM(input_size, hidden_size, 1)

I think the network architecture will look exactly like above. Am I wrong? And if yes, what is the difference between these two?

like image 588
user3828311 Avatar asked Mar 11 '18 19:03

user3828311


People also ask

What is Num_layers in LSTM Pytorch?

num_layers in RNN is just stacking RNNs on top of each other. So you get a hidden from each layer and an output only from the topmost layer.

What is the output of LSTM layer Pytorch?

The output of the Pytorch LSTM layer is a tuple with two elements.

How many LSTM layers should I use?

Generally, 2 layers have shown to be enough to detect more complex features. More layers can be better but also harder to train. As a general rule of thumb — 1 hidden layer work with simple problems, like this, and two are enough to find reasonably complex features.

What is hidden size in LSTM Pytorch?

Here the hidden_size of the LSTM layer would be 512 as there are 512 units in each LSTM cell and the num_layers would be 2. The num_layers is the number of layers stacked on top of each other.


1 Answers

The multi-layer LSTM is better known as stacked LSTM where multiple layers of LSTM are stacked on top of each other.

Your understanding is correct. The following two definitions of stacked LSTM are same.

nn.LSTM(input_size, hidden_size, 2)

and

nn.Sequential(OrderedDict([
    ('LSTM1', nn.LSTM(input_size, hidden_size, 1),
    ('LSTM2', nn.LSTM(hidden_size, hidden_size, 1)
]))

Here, the input is feed into the lowest layer of LSTM and then the output of the lowest layer is forwarded to the next layer and so on so forth. Please note, the output size of the lowest LSTM layer and the rest of the LSTM layer's input size is hidden_size.

However, you may have seen people defined stacked LSTM in the following way:

rnns = nn.ModuleList()
for i in range(nlayers):
    input_size = input_size if i == 0 else hidden_size
    rnns.append(nn.LSTM(input_size, hidden_size, 1))

The reason people sometimes use the above approach is that if you create a stacked LSTM using the first two approaches, you can't get the hidden states of each individual layer. Check out what LSTM returns in PyTorch.

So, if you want to have the intermedia layer's hidden states, you have to declare each individual LSTM layer as a single LSTM and run through a loop to mimic the multi-layer LSTM operations. For example:

outputs = []
for i in range(nlayers):
    if i != 0:
        sent_variable = F.dropout(sent_variable, p=0.2, training=True)
    output, hidden = rnns[i](sent_variable)
    outputs.append(output)
    sent_variable = output

In the end, outputs will contain all the hidden states of each individual LSTM layer.

like image 194
Wasi Ahmad Avatar answered Sep 30 '22 06:09

Wasi Ahmad