Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding multi-layer LSTM

I'm trying to understand and implement multi-layer LSTM. The problem is i don't know how they connect. I'm having two thoughs in mind:

  1. At each timestep, the hidden state H of the first LSTM will become the input of the second LSTM.

  2. At each timestep, the hidden state H of the first LSTM will become the initial value for the hidden state of the sencond LSTM, and the input of the first LSTM will become the input for the second LSTM.

Please help!

like image 794
Khoa Ngo Avatar asked Oct 17 '25 02:10

Khoa Ngo


2 Answers

TLDR: Each LSTM cell at time t and level l has inputs x(t) and hidden state h(l,t) In the first layer, the input is the actual sequence input x(t), and previous hidden state h(l, t-1), and in the next layer the input is the hidden state of the corresponding cell in the previous layer h(l-1,t).

From https://arxiv.org/pdf/1710.02254.pdf:

To increase the capacity of GRU networks (Hermans and Schrauwen 2013), recurrent layers can be stacked on top of each other. Since GRU does not have two output states, the same output hidden state h'2 is passed to the next vertical layer. In other words, the h1 of the next layer will be equal to h'2. This forces GRU to learn transformations that are useful along depth as well as time.

like image 112
Ido Cohn Avatar answered Oct 19 '25 21:10

Ido Cohn


I am taking help of colah's blog post, just that I will cut short it to make you understand specific part.

enter image description here

As you can look at above image, LSTMs have this chain like structure and each have four neural network layer.

The values that we pass to next timestamp (cell state) and to next layer(hidden state) are basically same and they are desired output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to pass.

We also pass previous cell state information (top arrow to next cell) to next timestamp(cell state) and then decide using sigmoid layer(forget gate layer), how much information we are going to keep taking help of new input and input from previous state.

Hope this helps.

like image 26
Tushar Gupta Avatar answered Oct 19 '25 22:10

Tushar Gupta



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!