Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use hidden states instead of outputs in LSTMs of keras

I want to use an implementation of an attention mechanism by Yang et al.. I found a working implementation of a custom layer that uses this attention machanism here. Instead of using the output values of my LSTM:

my_lstm = LSTM(128, input_shape=(a, b), return_sequences=True)
my_lstm = AttentionWithContext()(my_lstm)
out = Dense(2, activation='softmax')(my_lstm)

I would like to use the hidden states of the LSTM:

my_lstm = LSTM(128, input_shape=(a, b), return_state=True)
my_lstm = AttentionWithContext()(my_lstm)
out = Dense(2, activation='softmax')(my_lstm)

But I get the error:

TypeError: can only concatenate tuple (not "int") to tuple

I tried it in combination with return_sequences but everything I've tried failed so far. How can I modify the returning tensors in order to use it like the returned output sequences?

Thanks!

like image 651
V1nc3nt Avatar asked Aug 11 '17 19:08

V1nc3nt


People also ask

Is hidden state the output of LSTM?

Remember that in an LSTM, there are 2 data states that are being maintained — the “Cell State” and the “Hidden State”. By default, an LSTM cell returns the hidden state for a single time-step (the latest one). However, Keras still records the hidden state outputted by the LSTM at each time-step.

What is the role of hidden state in LSTM?

The hidden state in a RNN is basically just like a hidden layer in a regular feed-forward network - it just happens to also be used as an additional input to the RNN at the next time step. Where f is some non-linear function, Wxh is a weight matrix of size h×x, and Whh is a weight matrix of size h×h.

How do you get the hidden state of LSTM in keras?

LSTM with return_sequences=True returns the hidden state of the LSTM for every timestep in the input to the LSTM. For example, if the input batch is (samples, timesteps, dims) , then the call LSTM(units, return_sequences=True) will generate output of dimensions (samples, timesteps, units) .

What is the output of an LSTM layer?

Given these inputs, the LSTM cell produces two outputs: a “true” output and a new hidden state. We can represent this as such: The structure of an LSTM cell/module/unit.


2 Answers

I think your confusion possibly stems from the Keras documentation being a little unclear.

return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
return_state: Boolean. Whether to return the last state in addition to the output.

The docs on return_state are especially confusing because they imply that the hidden states are different from the outputs, but they are one in the same. For LSTMs this gets a little murky because in addition to the hidden (output) states, there is the cell state. We can confirm this by looking at the LSTM step function in the Keras code:

class LSTM(Recurrent):
    def step(...):
        ...
        return h, [h, c]

The return type of this step function is output, states. So we can see that the hidden state h is actually the output, and for the states we get both the hidden state h and the cell state c. This is why you see the Wiki article you linked using the terms "hidden" and "output" interchangeably.

Looking at the paper you linked a little closer, it seems to me your original implementation is what you want.

my_lstm = LSTM(128, input_shape=(a, b), return_sequences=True)
my_lstm = AttentionWithContext()(my_lstm)
out = Dense(2, activation='softmax')(my_lstm)

This will pass the hidden state at each timestep to your attention layer. The only scenario where you are out of luck is the one where you actually want to pass the cell state from each timestep to your attention layer (which is what I thought initially), but I do not think this is what you want. The paper you linked actually uses a GRU layer, which has no concept of a cell state, and whose step function also returns the hidden state as the output.

class GRU(Recurrent):
    def step(...):
        ...
        return h, [h]

So the paper is almost certainly referring to the hidden states (aka outputs) and not the cell states.

like image 163
Nicole White Avatar answered Nov 15 '22 22:11

Nicole White


Just to add one point to the Nicole's answer -

If we use combination of return_state = True and return_sequences = True in LSTM then the first [h] will return the hidden state aka output at each time step (vector) whereas 2nd [h] will return hidden state at last time step (scalar).

like image 38
Sunil Avatar answered Nov 15 '22 23:11

Sunil