Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras fit_generator() - How does batch for time series work?

Context:

I am currently working on time series prediction using Keras with Tensorflow backend and, therefore, studied the tutorial provided here.

Following this tutorial, I came to the point where the generator for the fit_generator() method is described. The output this generator generates is as follows (left sample, right target):

[[[10. 15.]
  [20. 25.]]] => [[30. 35.]]     -> Batch no. 1: 2 Samples | 1 Target
  ---------------------------------------------
[[[20. 25.]
  [30. 35.]]] => [[40. 45.]]     -> Batch no. 2: 2 Samples | 1 Target
  ---------------------------------------------
[[[30. 35.]
  [40. 45.]]] => [[50. 55.]]     -> Batch no. 3: 2 Samples | 1 Target
  ---------------------------------------------
[[[40. 45.]
  [50. 55.]]] => [[60. 65.]]     -> Batch no. 4: 2 Samples | 1 Target
  ---------------------------------------------
[[[50. 55.]
  [60. 65.]]] => [[70. 75.]]     -> Batch no. 5: 2 Samples | 1 Target
  ---------------------------------------------
[[[60. 65.]
  [70. 75.]]] => [[80. 85.]]     -> Batch no. 6: 2 Samples | 1 Target
  ---------------------------------------------
[[[70. 75.]
  [80. 85.]]] => [[90. 95.]]     -> Batch no. 7: 2 Samples | 1 Target
  ---------------------------------------------
[[[80. 85.]
  [90. 95.]]] => [[100. 105.]]   -> Batch no. 8: 2 Samples | 1 Target

In the tutorial the TimeSeriesGenerator was used, but for my question it is secondary if a custom generator or this class is used. Regarding the data, we have 8 steps_per_epoch and a sample of shape (8, 1, 2, 2). The generator is fed to a Recurrent Neural Network, implemented by an LSTM.

My questions

fit_generator() only allows a single target per batch, as outputted by the TimeSeriesGenerator. When I first read about the option of batches for fit(), I thought that I could have multiple samples and a corresponding number of targets (which are processed batchwise, meaning row by row). But this is not allowed by fit_generator() and, therefore, obviously false. This would look for example like:

[[[10. 15. 20. 25.]]] => [[30. 35.]]     
[[[20. 25. 30. 35.]]] => [[40. 45.]]    
    |-> Batch no. 1: 2 Samples | 2 Targets
  ---------------------------------------------
[[[30. 35. 40. 45.]]] => [[50. 55.]]    
[[[40. 45. 50. 55.]]] => [[60. 65.]]    
    |-> Batch no. 2: 2 Samples | 2 Targets
  ---------------------------------------------
...

Secondly, I thought that, for example, [10, 15] and [20, 25] were used as input for the RNN consecutively for the target [30, 35], meaning that this is analog to inputting [10, 15, 20, 25]. Since the output from the RNN differs using the second approach (I tested it), this also has to be a wrong conclusion.

Hence, my questions are:

  1. Why is only one target per batch allowed (I know there are some workarounds, but there has to be a reason)?
  2. How may I understand the calculation of one batch? Meaning, how is some input like [[[40, 45], [50, 55]]] => [[60, 65]] processed and why is it not analog to [[[40, 45, 50, 55]]] => [[60, 65]]



Edit according to todays answer
Since there is some misunderstanding about my definition of samples and targets - I follow what I understand Keras is trying to tell me when saying:

ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 2 target samples.

This error occurs, when I create for example a batch which looks like:

#This is just a single batch - Multiple batches would be fed to fit_generator()
(array([[[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]]]), 
                           array([[ 5,  6,  7,  8,  9],
                           [10, 11, 12, 13, 14]]))

This is supposed to be a single batch containing two time-sequences of length 5 (5 consecutive data points / time-steps), whose targets are also two corresponding sequences. [ 5, 6, 7, 8, 9] is the target of [0, 1, 2, 3, 4] and [10, 11, 12, 13, 14] is the corresponding target of [5, 6, 7, 8, 9].
The sample-shape in this would be shape(number_of_batches, number_of_elements_per_batch, sequence_size) and the target-shape shape(number_of_elements_per_batch, sequence_size).
Keras sees 2 target samples (in the ValueError), because I have two provide 3D-samples as input and 2D-targets as output (maybe I just don't get how to provide 3D-targets..).

Anyhow, according to @todays answer/comments, this is interpreted as two timesteps and five features by Keras. Regarding my first question (where I still see a sequence as target to my sequence, as in this edit-example), I seek information how/if I can achieve this and how such a batch would look like (like I tried to visualize in the question).

like image 284
Markus Avatar asked May 21 '19 00:05

Markus


People also ask

What is Fit_generator in keras?

fit() and keras. fit_generator() in Python are two separate deep learning libraries which can be used to train our machine learning and deep learning models. Both these functions can do the same task, but when to use which function is the main question.

What is the difference between fit and Fit_generator?

You pass your whole dataset at once in fit method. Also, use it if you can load whole data into your memory (small dataset). In fit_generator() , you don't pass the x and y directly, instead they come from a generator.

What is Steps_per_epoch?

steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.


1 Answers

Short answers:

Why is only one target per batch allowed (I know there are some workarounds, but there has to be a reason)?

That's not the case at all. There is no restriction on the number of target samples in a batch. The only requirement is that you should have the same number of input and target samples in each batch. Read the long answer for further clarification.

How may I understand the calculation of one batch? Meaning, how is some input like [[[40, 45], [50, 55]]] => [[60, 65]] processed and why is it not analog to [[[40, 45, 50, 55]]] => [[60, 65]]?

The first one is a multi-variate timeseries (i.e. each timestep has more than one features), and the second one is a uni-variate timeseris (i.e. each timestep has one feature). So they are not equivalent. Read the long answer for further clarification.

Long answer:

I'll give the answer I mentioned in comments section and try to elaborate on it using examples:

I think you are mixing samples, timesteps, features and targets. Let me describe how I understand it: in the first example you provided, it seems that each input sample consists of 2 timesteps, e.g. [10, 15] and [20, 25], where each timestep consists of two features, e.g. 10 and 15 or 20 and 25. Further, the corresponding target consists of one timestep, e.g. [30, 35], which also has two features. In other words, each input sample in a batch must have a corresponding target. However, the shape of each input sample and its corresponding target may not be necessarily the same.

For example, consider a model where both its input and output are timeseries. If we denote the shape of each input sample as (input_num_timesteps, input_num_features) and the shape of each target (i.e. output) array as (output_num_timesteps, output_num_features), we would have the following cases:

1) The number of input and output timesteps are the same (i.e. input_num_timesteps == output_num_timesteps). Just as an example, the following model could achieve this:

from keras import layers
from keras import models

inp = layers.Input(shape=(input_num_timesteps, input_num_features))

# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(..., return_sequences=True)(x)

# a final RNN layer that has `output_num_features` unit
out = layers.LSTM(output_num_features, return_sequneces=True)(x)

model = models.Model(inp, out)

2) The number of input and output timesteps are different (i.e. input_num_timesteps ~= output_num_timesteps). This is usually achieved by first encoding the input timeseries into a vector using a stack of one or more LSTM layers, and then repeating that vector output_num_timesteps times to get a timeseries of desired length. For the repeat operation, we can easily use RepeatVector layer in Keras. Again, just as an example, the following model could achieve this:

from keras import layers
from keras import models

inp = layers.Input(shape=(input_num_timesteps, input_num_features))

# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(...)(x)  # The last layer ONLY returns the last output of RNN (i.e. return_sequences=False)

# repeat `x` as needed (i.e. as the number of timesteps in output timseries)
x = layers.RepeatVector(output_num_timesteps)(x)

# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(x)
# ...
out = layers.LSTM(output_num_features, return_sequneces=True)(x)

model = models.Model(inp, out)

As a special case, if the number of output timesteps is 1 (e.g. the network is trying to predict the next timestep given the last t timesteps), we may not need to use repeat and instead we can just use a Dense layer (in this case the output shape of the model would be (None, output_num_features), and not (None, 1, output_num_features)):

inp = layers.Input(shape=(input_num_timesteps, input_num_features))

# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(...)(x)  # The last layer ONLY returns the last output of RNN (i.e. return_sequences=False)

out = layers.Dense(output_num_features, activation=...)(x)

model = models.Model(inp, out)

Note that the architectures provided above are just for illustration, and you may need to tune or adapt them, e.g. by adding more layers such as Dense layer, based on your use case and the problem you are trying to solve.


Update: The problem is that you don't pay enough attention when reading, both my comments and answer as well as the error raised by Keras. The error clearly states that:

... Found 1 input samples and 2 target samples.

So, after reading this carefully, if I were you I would say to myself: "OK, Keras thinks that the input batch has 1 input sample, but I think I am providing two samples!! Since I am a very good person(!), I think it's very likely that I would be wrong than Keras, so let's find out what I am doing wrong!". A simple and quick check would be to just examine the shape of input array:

>>> np.array([[[0, 1, 2, 3, 4],
               [5, 6, 7, 8, 9]]]).shape
(1,2,5)

"Oh, it says (1,2,5)! So that means one sample which has two timesteps and each timestep has five features!!! So I was wrong into thinking that this array consists of two samples of length 5 where each timestep is of length 1!! So what should I do now???" Well, you can fix it, step-by-step:

# step 1: I want a numpy array
s1 = np.array([])

# step 2: I want it to have two samples
s2 = np.array([
               [],
               []
              ])

# step 3: I want each sample to have 5 timesteps of length 1 in them
s3 = np.array([
               [
                [0], [1], [2], [3], [4]
               ],
               [
                [5], [6], [7], [8], [9]
               ]
              ])

>>> s3.shape
(2, 5, 1)

Voila! We did it! This was the input array; now check the target array, it must have two target samples of length 5 each with one feature, i.e. having a shape of (2, 5, 1):

>>> np.array([[ 5,  6,  7,  8,  9],
              [10, 11, 12, 13, 14]]).shape
(2,5)

Almost! The last dimension (i.e. 1) is missing (NOTE: depending on the architecture of your model you may or may not need that last axis). So we can use the step-by-step approach above to find our mistake, or alternatively we can be a bit clever and just add an axis to the end:

>>> t = np.array([[ 5,  6,  7,  8,  9],
                  [10, 11, 12, 13, 14]])
>>> t = np.expand_dims(t, axis=-1)
>>> t.shape
(2, 5, 1)

Sorry, I can't explain it better than this! But in any case, when you see that something (i.e. shape of input/target arrays) is repeated over and over in my comments and my answer, assume that it must be something important and should be checked.

like image 112
today Avatar answered Oct 25 '22 10:10

today