I try to understand what the difference between this model describde here, the following one:
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
and the sequence to sequence model described here is second describion
What is the difference ? The first one has the RepeatVector while the second does not have that? Is the first model not taking the decoders hidden state as inital state for the prediction ?
Are there a paper describing the first and second one ?
In the model using RepeatVector
, they're not using any kind of fancy prediction, nor dealing with states. They're letting the model do everything internally and the RepeatVector
is used to transform a (batch, latent_dim)
vector (which is not a sequence) into a (batch, timesteps, latent_dim)
(which is now a proper sequence).
Now, in the other model, without RepeatVector
, the secret lies in this additional function:
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentence
This runs a "loop" based on a stop_condition
for creating the time steps one by one. (The advantage of this is making sentences without a fixed length).
It also explicitly takes the states generated in each step (in order to keep the proper connection between each individual step).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With