Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between return state and return sequence in a keras GRU layer?

I can't seem to wrap my head around the difference between return state and return sequence in a keras GRU layer.

Since a GRU unit does not have a cell state (it is equal to the ouput), how does return state differ from return sequence in a keras GRU layer?

More specifically, I built an encoder-decoder LSTM model with one encoder layer and one decoder layer. The encoder layer returns its state (return_state = TRUE) and the decoder layer uses these states as initial state (initial_state = encoder_states).

When trying to do this with GRU layers, I do not understand what states are passed between the encoder and decoder layer. Please let me know if you can clarify this. Thanks in advance.

like image 861
StackMikeFlow Avatar asked Feb 26 '19 14:02

StackMikeFlow


1 Answers

The "state" of a GRU layer will usually be be same as the "output". However if you pass in return_state=True and return_sequence=True then the output of the layer will the output after each element of the sequence but the state will only be the state after the last element of the sequence is processed.

Here's an example of an encoder/decoder for a seq-2-seq network using GRU layers

#Create layers
encoder_input_layer = Input(shape=(None,))
encoder_embedding_layer = Embedding(len(vocab), THOUGHT_VECTOR_SIZE)
encoder_gru_layer = GRU(THOUGHT_VECTOR_SIZE, return_state=True)

decoder_input_layer = Input(shape=(None,))
decoder_embedding_layer = Embedding(len(vocab), THOUGHT_VECTOR_SIZE)
decoder_gru_layer = GRU(THOUGHT_VECTOR_SIZE, return_sequences=True)
decoder_dense_layer = Dense(len(vocab), activation='softmax')


#connect network
encoder = encoder_embedding_layer(encoder_input_layer)
encoder, encoder_state = encoder_gru_layer(encoder)

decoder = decoder_embedding_layer(decoder_input_layer)
decoder = decoder_gru_layer(decoder, initial_state=encoder_state)
decoder = decoder_dense_layer(decoder)

model = Model([encoder_input_layer, decoder_input_layer], decoder)

But to your point, using return_state isn't really necessary here as the output and state from encoder_gru_layer will be the same.

like image 116
rob Avatar answered Oct 16 '22 23:10

rob