I am trying to understand the use of TimeDistributed layer in keras/tensorflow. I have read some threads and articles but still I didn't get it properly.
The threads that gave me some understanding of what the TImeDistributed layer does are -
What is the role of TimeDistributed layer in Keras?
TimeDistributed(Dense) vs Dense in Keras - Same number of parameters
But I still don't know why the layer is actually used!
For example, both the below codes will provide same output (& output_shape):
model = Sequential()
model.add(TimeDistributed(LSTM(5, input_shape = (10, 20), return_sequences = True)))
print(model.output_shape)
model = Sequential()
model.add(LSTM(5, input_shape = (10, 20), return_sequences = True))
print(model.output_shape)
And the output shape will be (according to my knowledge) -
(None, 10, 5)
So, if both the models provide same output, what is actually the use of TimeDistributed Layer?
And I also had one other question. TimeDistributed layer applies time related data to separate layers (sharing same weights). So, how is it different from unrolling the LSTM layer which is provided in keras API as:
unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
What is the difference between these two?
Thank you.. I am still a newbie and so have many questions.
TimeDistributed(layer, **kwargs) This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension.
Dense Layer is simple layer of neurons in which each neuron receives input from all the neurons of previous layer, thus called as dense. Dense Layer is used to classify image based on output from convolutional layers. Working of single neuron. A layer contains multiple number of such neurons.
As Keras documentation suggests TimeDistributed is a wrapper that applies a layer to every temporal slice of an input.
Here is an example which might help:
Let's say that you have video samples of cats and your task is a simple video classification problem, returning 0 if the cat is not moving or 1 if the cat is moving. Let's assume your input dim is (None, 50, 25, 25, 3) which means you have 50 time steps or frames per sample, and your frames are 25 by 25 and have 3 channels, rgb.
Well, one aporoach would be to extract some "features" from each frame using CNN, like Conv2D, and then pass them to an LSTM layer. But the feature extraction would be the same for each frame. Now TimeDistributed comes to the rescue. You can wrap your Conv2D with it, then pass the output to a Flatten layer wrapped also by TimeDistributed. So after applying TimeDistributed(Conv2D(...)), the output would be something of dim like (None, 50, 5, 5, 16), and after TimeDistributed(Flatten()), the output would be of dim (None, 50, 400). (The actual dim would depend on Conv2D parameters.)
The output at this layer now can be passes through LSTM.
So obviously, LSTM itself does not need a TimeDistributed wrapper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With