Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TimeDistributed(Dense) vs Dense in seq2seq

Given the code below

encoder_inputs = Input(shape=(16, 70))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(59, 93))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs,_,_ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(93, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

if I change

decoder_dense = TimeDistributed(Dense(93, activation='softmax'))

to

decoder_dense = Dense(93, activation='softmax')

it still work, but which method is more effective?

like image 823
william007 Avatar asked May 17 '26 09:05

william007


1 Answers

If your Data is dependent on Time, like Time Series Data or the data comprising different frames of a Video, then Time Distributed Dense Layer is effective than simple Dense Layer.

Time Distributed Dense applies the same dense layer to every time step during GRU/LSTM Cell unrolling. That’s why the error function will be between the predicted label sequence and the actual label sequence.

Using return_sequences=False, the Dense layer will get applied only once in the last cell. This is normally the case when RNNs are used for classification problems.

If return_sequences=True, then the Dense layer is used to apply at every timestep just like TimeDistributedDense.

In your models both are the same, but if u change your second model to return_sequences=False, then the Dense will be applied only at the last cell.

Hope this helps. Happy Learning!


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!