Correct way to split data to batches for Keras stateful RNNs

Tags:

As the documentation states

the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch

does it mean that to split data to batches I need to do it the following way e.g. let's assume that I am training a stateful RNN to predict the next integer in range(0, 5) given the previous one

# batch_size = 3
# 0, 1, 2 etc in x are samples (timesteps and features omitted for brevity of the example)
x = [0, 1, 2, 3, 4]
y = [1, 2, 3, 4, 5]

batches_x = [[0, 1, 2], [1, 2, 3], [2, 3, 4]]
batches_y = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

then the state after learning on x[0, 0] will be initial state for x[1, 0] and x[0, 1] for x[1, 1] (0 for 1 and 1 for 2 etc)?

Is it the right way to do it?

232

asked Sep 20 '17 19:09

Bob

1 Answers

Based on this answer, for which I performed some tests.

Stateful=False:

Normally (stateful=False), you have one batch with many sequences:

batch_x = [
            [[0],[1],[2],[3],[4],[5]],
            [[1],[2],[3],[4],[5],[6]],
            [[2],[3],[4],[5],[6],[7]],
            [[3],[4],[5],[6],[7],[8]]
          ]

The shape is (4,6,1). This means that you have:

1 batch
4 individual sequences = this is batch size and it can vary
6 steps per sequence
1 feature per step

Every time you train, either if you repeat this batch or if you pass a new one, it will see individual sequences. Every sequence is a unique entry.

Stateful=True:

When you go to a stateful layer, You are not going to pass individual sequences anymore. You are going to pass very long sequences divided in small batches. You will need more batches:

batch_x1 = [
             [[0],[1],[2]],
             [[1],[2],[3]],
             [[2],[3],[4]],
             [[3],[4],[5]]
           ]
batch_x2 = [
             [[3],[4],[5]], #continuation of batch_x1[0]
             [[4],[5],[6]], #continuation of batch_x1[1]
             [[5],[6],[7]], #continuation of batch_x1[2]
             [[6],[7],[8]]  #continuation of batch_x1[3]
           ]

Both shapes are (4,3,1). And this means that you have:

2 batches
4 individual sequences = this is batch size and it must be constant
6 steps per sequence (3 steps in each batch)
1 feature per step

The stateful layers are meant to huge sequences, long enough to exceed your memory or your available time for some task. Then you slice your sequences and process them in parts. There is no difference in the results, the layer is not smarter or has additional capabilities. It just doesn't consider that the sequences have ended after it processes one batch. It expects the continuation of those sequences.

In this case, you decide yourself when the sequences have ended and call model.reset_states() manually.

answered Sep 26 '22 01:09

Daniel Möller

Related questions
                            
                                How to get both score and accuracy after training
                            
                                TensorFlow - Invalid argument: Reshape:0 is both fed and fetched
                            
                                How to work with machine learning algorithms in embedded systems?
                            
                                How do you load an LMDB file into TensorFlow?
                            
                                Where can I find the Keras configuration file?
                            
                                Tensorflow LSTM RNN output activation function
                            
                                How do I set the value of an input tensor in c++?
                            
                                Why does nobody use Hopfield nets for MNIST?
                            
                                RandomizedSearchCV gives different results using the same random_state
                            
                                Using tensorflow models in web applications
                            
                                How do you concatenate symbols in mxnet
                            
                                Keras TimeDistributed - are weights shared?
                            
                                How to interpret probability column in spark logistic regression prediction?
                            
                                Accurate text generation
                            
                                How do you use PyTorch PackedSequence in code?
                            
                                How can you compare two cluster groupings in terms of similarity or overlap in Python?
                            
                                Keras Training warm_start
                            
                                Why does Markov blanket contain the children's parents?
                            
                                Difference between GradientDescentOptimizer and AdamOptimizer in tensorflow?
                            
                                Do I need to scale test data and Dependent variable in the train data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Correct way to split data to batches for Keras stateful RNNs

Tags:

machine-learning

deep-learning

keras

lstm

Bob

People also ask

1 Answers

Daniel Möller

Recent Activity

Donate For Us