Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct way to split data to batches for Keras stateful RNNs

As the documentation states

the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch

does it mean that to split data to batches I need to do it the following way e.g. let's assume that I am training a stateful RNN to predict the next integer in range(0, 5) given the previous one

# batch_size = 3
# 0, 1, 2 etc in x are samples (timesteps and features omitted for brevity of the example)
x = [0, 1, 2, 3, 4]
y = [1, 2, 3, 4, 5]

batches_x = [[0, 1, 2], [1, 2, 3], [2, 3, 4]]
batches_y = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

then the state after learning on x[0, 0] will be initial state for x[1, 0] and x[0, 1] for x[1, 1] (0 for 1 and 1 for 2 etc)?

Is it the right way to do it?

like image 232
Bob Avatar asked Sep 20 '17 19:09

Bob


People also ask

What is the best batch size?

In practical terms, to determine the optimum batch size, we recommend trying smaller batch sizes first(usually 32 or 64), also keeping in mind that small batch sizes require small learning rates. The number of batch sizes should be a power of 2 to take full advantage of the GPUs processing.

How do you choose batch size and epochs?

Generally batch size of 32 or 25 is good, with epochs = 100 unless you have large dataset. in case of large dataset you can go with batch size of 10 with epochs b/w 50 to 100. Again the above mentioned figures have worked fine for me. Value for batch size should be (preferred) in powers of 2.

How do I set batch size in keras?

If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a batch_size argument to a layer. If you pass both batch_size=32 and input_shape=(6, 8) to a layer, it will then expect every batch of inputs to have the batch shape (32, 6, 8) .

Does batch size affect accuracy?

Our parallel coordinate plot also makes a key tradeoff very evident: larger batch sizes take less time to train but are less accurate.


1 Answers

Based on this answer, for which I performed some tests.

Stateful=False:

Normally (stateful=False), you have one batch with many sequences:

batch_x = [
            [[0],[1],[2],[3],[4],[5]],
            [[1],[2],[3],[4],[5],[6]],
            [[2],[3],[4],[5],[6],[7]],
            [[3],[4],[5],[6],[7],[8]]
          ]

The shape is (4,6,1). This means that you have:

  • 1 batch
  • 4 individual sequences = this is batch size and it can vary
  • 6 steps per sequence
  • 1 feature per step

Every time you train, either if you repeat this batch or if you pass a new one, it will see individual sequences. Every sequence is a unique entry.

Stateful=True:

When you go to a stateful layer, You are not going to pass individual sequences anymore. You are going to pass very long sequences divided in small batches. You will need more batches:

batch_x1 = [
             [[0],[1],[2]],
             [[1],[2],[3]],
             [[2],[3],[4]],
             [[3],[4],[5]]
           ]
batch_x2 = [
             [[3],[4],[5]], #continuation of batch_x1[0]
             [[4],[5],[6]], #continuation of batch_x1[1]
             [[5],[6],[7]], #continuation of batch_x1[2]
             [[6],[7],[8]]  #continuation of batch_x1[3]
           ]

Both shapes are (4,3,1). And this means that you have:

  • 2 batches
  • 4 individual sequences = this is batch size and it must be constant
  • 6 steps per sequence (3 steps in each batch)
  • 1 feature per step

The stateful layers are meant to huge sequences, long enough to exceed your memory or your available time for some task. Then you slice your sequences and process them in parts. There is no difference in the results, the layer is not smarter or has additional capabilities. It just doesn't consider that the sequences have ended after it processes one batch. It expects the continuation of those sequences.

In this case, you decide yourself when the sequences have ended and call model.reset_states() manually.

like image 60
Daniel Möller Avatar answered Sep 26 '22 01:09

Daniel Möller