Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the return_state output using Keras' RNN Layer

I check the Keras documentation for LSTM layer, the information about the RNN argument is as bellow:

keras.layers.LSTM(units, return_state=True)

Arguments:

return_state: Boolean. Whether to return the last state in addition to the output.

Output shape

if return_state: a list of tensors. The first tensor is the output. The remaining tensors are the last states, each with shape (batch_size, units)

And that's all of the info about return_state for RNN. As a beginner, it's really hard to understand what exactly it means that The remaining tensors are the last states, each with shape (batch_size, units), isn't it?

I do understand there are cell state c, and hidden state a that would be passed to next time step.

But when I did the programing exercise for online course, I encounter this question. Bellow is the hint given by the assignment. But I don't understand what are these three outputs means.

from keras.layers import LSTM
LSTM_cell = LSTM(n_a, return_state = True)
a, _, c = LSTM_cell(input_x, initial_state=[a, c])

Someone said, they are respectively (https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/):

1 The LSTM hidden state output for the last time step.

2 The LSTM hidden state output for the last time step (again).

3 The LSTM cell state for the last time step.

I always regard output a as hidden state ouput for LSTM, and c as cell state output. But this person said that the first output is lstm output, while the second one is the hidden sate output, which is different from the hint given by the online course instruction (as the hint uses the first output as hidden state output for next time step).

Could anyone tell me more about this?

For a more general question, like in this case, Keras doesn't give an beginner friendly understandable documentation or examples, how to learn Keras more efficiently?

like image 357
Jason Avatar asked Mar 06 '18 03:03

Jason


Video Answer


2 Answers

Think about how you would start an iteration of the LSTM. You have a hidden state c, an input x, but you also need an alleged previous output h, which is concatenated with x. The LSTM has therefore two hidden tensors that need to be initialized: c and h. Now h happens to be the output of the previous state, which is why you pass it as input together with c. When you set return_state=True, both c and h are returned. Together with the output, you'll therefore receive 3 tensors.

like image 98
Jus Avatar answered Sep 19 '22 05:09

Jus


output,h(hidden state),c(memory/ cell state)

Take LSTM as an example, you can understand like this:

c(t) depend on c(t-1);  
o(t) depend on x(t) and h(t-1);  
h(t) depend on o(t) and c(t);  
like image 22
Ludwig Pully Avatar answered Sep 23 '22 05:09

Ludwig Pully