I have a dialog corpus like below. And I want to implement a LSTM model which predicts a system action. The system action is described as a bit vector. And a user input is calculated as a word-embedding which is also a bit vector.
t1: user: "Do you know an apple?", system: "no"(action=2)
t2: user: "xxxxxx", system: "yyyy" (action=0)
t3: user: "aaaaaa", system: "bbbb" (action=5)
So what I want to realize is "many to many (2)" model. When my model receives a user input, it must output a system action.
But I cannot understand return_sequences
option and TimeDistributed
layer after LSTM. To realize "many-to-many (2)", return_sequences==True
and adding a TimeDistributed
after LSTMs are required? I appreciate if you would give more description of them.
return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
TimeDistributed: This wrapper allows to apply a layer to every temporal slice of an input.
I think I could understand the return_sequence
option. But I am not still sure about TimeDistributed
. If I add a TimeDistributed
after LSTMs, is the model the same as "my many-to-many(2)" below? So I think Dense layers are applied for each output.
TimeDistributed class This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension.
Return sequences refer to return the hidden state a<t>. By default, the return_sequences is set to False in Keras RNN layers, and this means the RNN layer will only return the last hidden state output a<T>. The last hidden state output captures an abstract representation of the input sequence.
LSTM return_sequences=True value: The ouput is a 3D array of real numbers. The first dimension is indicating the number of samples in the batch given to the LSTM layer. The second dimension is the number of time steps in the input sequence.
TimeDistributed layer is very useful to work with time series data or video frames. It allows to use a layer for each input. That means that instead of having several input “models”, we can use “one model” applied to each input. Then GRU or LSTM can help to manage the data in “time”.
The LSTM layer and the TimeDistributed wrapper are two different ways to get the "many to many" relationship that you want.
As you can see, the difference between the two is that the LSTM "propagates the information through the sequence, it will eat one word, update its state and return it or not. Then it will go on with the next word while still carrying information from the previous ones.... as in the TimeDistributed, the words will be processed in the same way on their own, as if they were in silos and the same layer applies to every one of them.
So you dont have to use LSTM and TimeDistributed in a row, you can do whatever you want, just keep in mind what each of them do.
I hope it's clearer?
EDIT:
The time distributed, in your case, applies a dense layer to every element that was output by the LSTM.
Let's take an example:
You have a sequence of n_words words that are embedded in emb_size dimensions. So your input is a 2D tensor of shape (n_words, emb_size)
First you apply an LSTM with output dimension = lstm_output
and return_sequence = True
. The output will still be a squence so it will be a 2D tensor of shape (n_words, lstm_output)
. So you have n_words vectors of length lstm_output.
Now you apply a TimeDistributed dense layer with say 3 dimensions output as parameter of the Dense. So TimeDistributed(Dense(3)). This will apply Dense(3) n_words times, to every vectors of size lstm_output in your sequence independently... they will all become vectors of length 3. Your output will still be a sequence so a 2D tensor, of shape now (n_words, 3)
.
Is it clearer? :-)
return_sequences=True parameter:
If We want to have a sequence for the output, not just a single vector as we did with normal Neural Networks, so it’s necessary that we set the return_sequences to True. Concretely, let’s say we have an input with shape (num_seq, seq_len, num_feature). If we don’t set return_sequences=True, our output will have the shape (num_seq, num_feature), but if we do, we will obtain the output with shape (num_seq, seq_len, num_feature).
TimeDistributed wrapper layer:
Since we set return_sequences=True in the LSTM layers, the output is now a three-dimension vector. If we input that into the Dense layer, it will raise an error because the Dense layer only accepts two-dimension input. In order to input a three-dimension vector, we need to use a wrapper layer called TimeDistributed. This layer will help us maintain output’s shape, so that we can achieve a sequence as output in the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With