Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why set return_sequences=True and stateful=True for tf.keras.layers.LSTM?

I am learning tensorflow2.0 and follow the tutorial. In the rnn example, I found the code:

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, 
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units, 
                        return_sequences=True, 
                        stateful=True, 
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

My question is: why the code set the argument return_sequences=True and stateful=True? How about using the default argument?

like image 788
tidy Avatar asked Mar 22 '19 08:03

tidy


People also ask

What is return_sequences true in LSTM?

LSTM return_sequences=True value: When return_sequences parameter is True, it will output all the hidden states of each time steps. The ouput is a 3D array of real numbers. The third dimension is the dimensionality of the output space defined by the units parameter in Keras LSTM implementation.

Where do we use return_sequences true?

You must set return_sequences=True when stacking LSTM layers so that the second LSTM layer has a three-dimensional sequence input. For more details, see the post: Stacked Long Short-Term Memory Networks.

What is return_sequences true?

So with return_sequence=TRUE, the output will be a sequence of the same length, with return_sequence=FALSE, the output will be just one vector. TimeDistributed. This wrapper allows you to apply one layer (say Dense for example) to every element of your sequence independently.

What is stateful LSTM?

All the RNN or LSTM models are stateful in theory. These models are meant to remember the entire sequence for prediction or classification tasks. However, in practice, you need to create a batch to train a model with backprogation algorithm, and the gradient can't backpropagate between batches.


2 Answers

The example in the tutorial is about text generation. This is the input that is fed to the network in a batch:

(64, 100, 65) # (batch_size, sequence_length, vocab_size)

  1. return_sequences=True

Since the intention is to predict a character for every time step i.e. for every character in the sequence, the next character needs to be predicted.

So, the argument return_sequences=True is set to true, to get an output shape of (64, 100, 65). If this argument is set to False, then only the last output would be returned, so for batch of 64, output would be (64, 65) i.e. for every sequence of 100 characters, only the last predicted character would be returned.

  1. stateful=True

From the documentation, "If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch."

In the below diagram from the tutorial, you can see that setting stateful helps the LSTM make better predictions by providing the context of the previous prediction.

enter image description here

like image 179
Manoj Mohan Avatar answered Oct 19 '22 20:10

Manoj Mohan


Return Sequences

Lets look at a typical model architectures built using LSTMs.

Sequence to sequence models:

enter image description here

We feed in a sequence of inputs (x's), one batch at a time and each LSTM cell returns an output (y_i). So if your input is of size batch_size x time_steps X input_size then the LSTM output will be batch_size X time_steps X output_size. This is called a sequence to sequence model because an input sequence is converted into an output sequence. Typical usages of this model are in tagger (POS tagger, NER Tagger). In keras this is achieved by setting return_sequences=True.

Sequence classification - Many to one Architecture

enter image description here

In many to one architecture we use output sates of the only the last LSTM cell. This kind of architecture is normally used for classification problems like predicting if a movie review (represented as a sequence of words) is +ve of -ve. In keras if we set return_sequences=False the model returns the output state of only the last LSTM cell.

Stateful

An LSTM cell is composed of many gates as show in figure below from this blog post. The states/gates of the previous cell is used to calculate the state of the current cell. In keras if stateful=False then the states are reset after each batch. If stateful=True the states from the previous batch for index i will be used as initial state for index i in the next batch. So state information get propagated between batches with stateful=True. Check this link for explanation of usefulness of statefulness with an example.

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

like image 16
mujjiga Avatar answered Oct 19 '22 19:10

mujjiga