why set return_sequences=True and stateful=True for tf.keras.layers.LSTM?

Tags:

I am learning tensorflow2.0 and follow the tutorial. In the rnn example, I found the code:

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, 
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units, 
                        return_sequences=True, 
                        stateful=True, 
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

My question is: why the code set the argument return_sequences=True and stateful=True? How about using the default argument?

788

asked Mar 22 '19 08:03

tidy

2 Answers

The example in the tutorial is about text generation. This is the input that is fed to the network in a batch:

(64, 100, 65) # (batch_size, sequence_length, vocab_size)

return_sequences=True

Since the intention is to predict a character for every time step i.e. for every character in the sequence, the next character needs to be predicted.

So, the argument return_sequences=True is set to true, to get an output shape of (64, 100, 65). If this argument is set to False, then only the last output would be returned, so for batch of 64, output would be (64, 65) i.e. for every sequence of 100 characters, only the last predicted character would be returned.

stateful=True

From the documentation, "If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch."

In the below diagram from the tutorial, you can see that setting stateful helps the LSTM make better predictions by providing the context of the previous prediction.

enter image description here

179

answered Oct 19 '22 20:10

Manoj Mohan

Return Sequences

Lets look at a typical model architectures built using LSTMs.

Sequence to sequence models:

enter image description here

We feed in a sequence of inputs (x's), one batch at a time and each LSTM cell returns an output (y_i). So if your input is of size batch_size x time_steps X input_size then the LSTM output will be batch_size X time_steps X output_size. This is called a sequence to sequence model because an input sequence is converted into an output sequence. Typical usages of this model are in tagger (POS tagger, NER Tagger). In keras this is achieved by setting return_sequences=True.

Sequence classification - Many to one Architecture

enter image description here

In many to one architecture we use output sates of the only the last LSTM cell. This kind of architecture is normally used for classification problems like predicting if a movie review (represented as a sequence of words) is +ve of -ve. In keras if we set return_sequences=False the model returns the output state of only the last LSTM cell.

Stateful

An LSTM cell is composed of many gates as show in figure below from this blog post. The states/gates of the previous cell is used to calculate the state of the current cell. In keras if stateful=False then the states are reset after each batch. If stateful=True the states from the previous batch for index i will be used as initial state for index i in the next batch. So state information get propagated between batches with stateful=True. Check this link for explanation of usefulness of statefulness with an example.

answered Oct 19 '22 19:10

mujjiga

Related questions
                            
                                How to do weight initialization by Xavier rule in Tensorflow 2.0?
                            
                                This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical
                            
                                Tensorflow 'feed_dict': using same symbol for key-value pair got 'TypeError: Cannot interpret feed_dict key as Tensor'
                            
                                Is there an no-op (pass-through) operation in tensorflow?
                            
                                pip install tensorflow cannot find file called client_load_reporting_filter.h
                            
                                How to get output of hidden layer given an input, weights and biases of the hidden layer in keras?
                            
                                OSError: raw write() returned invalid length when using print() in python
                            
                                Keras,models.add() missing 1 required positional argument: 'layer'
                            
                                MFCC Python: completely different result from librosa vs python_speech_features vs tensorflow.signal
                            
                                Tensorboard Error 'Can not convert a AdamOptimizer into a Tensor or Operation.'
                            
                                Tensorflow compatibility with Keras
                            
                                Installing tensorflow on Pycharm (Mac)
                            
                                Loading two models from Saver in the same Tensorflow session
                            
                                Tensorflow : Graph is finalized and cannot be modified
                            
                                Tensorflow set_seed error when running autoencoder
                            
                                Some Python objects were not bound to checkpointed values
                            
                                Unable to import SGD and Adam from 'keras.optimizers'
                            
                                Tensorflow logging messages do not appear
                            
                                How to fix "RuntimeError: Missing implementation that supports: loader" when calling hub.text_embedding_column method?
                            
                                ImportError: cannot import name 'keras_tensor' from 'tensorflow.python.keras.engine'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

why set return_sequences=True and stateful=True for tf.keras.layers.LSTM?

Tags:

tensorflow

keras

lstm

recurrent-neural-network

tidy

People also ask

2 Answers

Manoj Mohan

Return Sequences

Sequence to sequence models:

Sequence classification - Many to one Architecture

Stateful

mujjiga

Recent Activity

Donate For Us