I was going through some tutorial about the sentiment analysis using lstm network. The below code said that its stacks up the lstm output. I Don't know how it works.
lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
The output of the Pytorch LSTM layer is a tuple with two elements.
To stack LSTM layers, we need to change the configuration of the prior LSTM layer to output a 3D array as input for the subsequent layer. We can do this by setting the return_sequences argument on the layer to True (defaults to False). This will return one output for each input time step and provide a 3D array.
Here the hidden_size of the LSTM layer would be 512 as there are 512 units in each LSTM cell and the num_layers would be 2. The num_layers is the number of layers stacked on top of each other.
It indeed stacks the output, the comment by kHarshit is misleading here!
To visualize this, let us review the output of the previous line in the tutorial (accessed May 1st, 2019):
lstm_out, hidden = self.lstm(embeds, hidden)
The output dimension of this will be [sequence_length, batch_size, hidden_size*2]
, as per the documentation. Here, the length of twice the input comes from having a bidirectional LSTM. Therefore, your first half of the last dimension will always be the forward output, and then afterwards the backwards output (I'm not entirely sure on the direction of that, but it seems to me that it is already in the right direction).
Then, the actual line that you are concerned about:
We're ignoring the specifics of .contiguous()
here, but you can read up on it in this excellent answer on Stackoverflow. In summary, it basically makes sure that your torch.Tensor
is in the right alignment in memory.
Lastly, .view()
allows you to reshape a resulting tensor in a specific way. Here, we're aiming for a shape that has two dimensions (as defined by the number of input arguments to .view()
. Specifically, the second dimension is supposedly having the size hidden_dim
. -1
for the first dimension simply means that we're redistributing the vector dimension in such a way that we don't care about the exact dimension, but simply satisfy the other dimension's requirements.
So, if you have a vector of, say, length 40, and want to reshape that one into a 2D-Tensor of (-1, 10)
, then the resulting tensor would have shape (4, 10)
.
As we've previously said that the first half of the vector (length hidden_dim
) is the forward output, and the latter half is the second half, then the resulting split into a tensor of (-1, hidden_dim)
will be resulting in a tensor of (2, hidden_dim)
, where the first row contains the forward output, "stacked" on top of the second row, which equals the reverse layer's output.
Visual example:
lstm_out, hidden = self.lstm(embeds, hidden)
print(lstm_out) # imagine a sample output like [1,0 , 2,0]
# forward out | backward out
stacked = lstm_out.contiguous().view(-1,hidden_dim) # hidden_dim = 2
print(stacked) # torch.Tensor([[1,0],
# [2,0]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With