I check the Keras documentation for LSTM layer, the information about the RNN argument is as bellow:
keras.layers.LSTM(units, return_state=True)
Arguments:
return_state: Boolean. Whether to return the last state in addition to the output.
Output shape
if return_state: a list of tensors. The first tensor is the output. The remaining tensors are the last states, each with shape (batch_size, units)
And that's all of the info about return_state for RNN. As a beginner, it's really hard to understand what exactly it means that The remaining tensors are the last states, each with shape (batch_size, units), isn't it?
I do understand there are cell state c, and hidden state a that would be passed to next time step.
But when I did the programing exercise for online course, I encounter this question. Bellow is the hint given by the assignment. But I don't understand what are these three outputs means.
from keras.layers import LSTM
LSTM_cell = LSTM(n_a, return_state = True)
a, _, c = LSTM_cell(input_x, initial_state=[a, c])
Someone said, they are respectively (https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/):
1 The LSTM hidden state output for the last time step.
2 The LSTM hidden state output for the last time step (again).
3 The LSTM cell state for the last time step.
I always regard output a as hidden state ouput for LSTM, and c as cell state output. But this person said that the first output is lstm output, while the second one is the hidden sate output, which is different from the hint given by the online course instruction (as the hint uses the first output as hidden state output for next time step).
Could anyone tell me more about this?
For a more general question, like in this case, Keras doesn't give an beginner friendly understandable documentation or examples, how to learn Keras more efficiently?
Think about how you would start an iteration of the LSTM. You have a hidden state c
, an input x
, but you also need an alleged previous output h
, which is concatenated with x
. The LSTM has therefore two hidden tensors that need to be initialized: c
and h
. Now h
happens to be the output of the previous state, which is why you pass it as input together with c
. When you set return_state=True
, both c
and h
are returned. Together with the output, you'll therefore receive 3 tensors.
output,h(hidden state),c(memory/ cell state)
Take LSTM as an example, you can understand like this:
c(t) depend on c(t-1);
o(t) depend on x(t) and h(t-1);
h(t) depend on o(t) and c(t);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With