I'm having trouble understanding exactly the scope of an LSTM cell --how it maps to a network's layers. From Graves (2014):
Seems to me that in a single-layered network the layer = lstm cell. How does this actually work in a multilayered rnn?
Three-layer RNN
LSTM Cell
The output of the cell is h_t with no superindex indicating a specific layer. Same thing with the equations. Does each cell span across a single layer? Or does each cell span across the entire three nodes at each time step?
Each node with name h
in Figure 1 represents one LSTM cell. Note that h_{t-1}
, h{t}
and h{t+1}
with the same superindex are the same cell. They are just unrolled in time. However, different superindices represent different LSTM cells.
The input of a cell with superindex 2 or 3 is not just the data sample x
but also output of the previous cell.
You are correct. A single-layered RNN network consists of one LSTM cell. In the multi-layer RNN case, input of an intermediate LSTM cell is output of the previous LSTM cell. In Figure 1, the data sample x
is also fed along with the LSTM output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With