Understanding multi-layer LSTM

Question

I'm trying to understand and implement multi-layer LSTM. The problem is i don't know how they connect. I'm having two thoughs in mind:

At each timestep, the hidden state H of the first LSTM will become the input of the second LSTM.
At each timestep, the hidden state H of the first LSTM will become the initial value for the hidden state of the sencond LSTM, and the input of the first LSTM will become the input for the second LSTM.

Please help!

Ido Cohn · Accepted Answer

TLDR: Each LSTM cell at time t and level l has inputs x(t) and hidden state h(l,t) In the first layer, the input is the actual sequence input x(t), and previous hidden state h(l, t-1), and in the next layer the input is the hidden state of the corresponding cell in the previous layer h(l-1,t).

From https://arxiv.org/pdf/1710.02254.pdf:

To increase the capacity of GRU networks (Hermans and Schrauwen 2013), recurrent layers can be stacked on top of each other. Since GRU does not have two output states, the same output hidden state h'2 is passed to the next vertical layer. In other words, the h1 of the next layer will be equal to h'2. This forces GRU to learn transformations that are useful along depth as well as time.

Tushar Gupta · Answer

I am taking help of colah's blog post, just that I will cut short it to make you understand specific part.

enter image description here

As you can look at above image, LSTMs have this chain like structure and each have four neural network layer.

The values that we pass to next timestamp (cell state) and to next layer(hidden state) are basically same and they are desired output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to pass.

We also pass previous cell state information (top arrow to next cell) to next timestamp(cell state) and then decide using sigmoid layer(forget gate layer), how much information we are going to keep taking help of new input and input from previous state.

Hope this helps.

Understanding multi-layer LSTM

Tags:

neural-network

deep-learning

lstm

Khoa Ngo

2 Answers

Ido Cohn

Tushar Gupta

Recent Activity

Donate For Us

Understanding multi-layer LSTM

Tags:

neural-network

deep-learning

lstm

Khoa Ngo

2 Answers

Ido Cohn

Tushar Gupta

Related questions

Recent Activity

Donate For Us