I am doing a time series forecasting exercise using the window method but i am struggling to understand how to do the forecast out of sample. Here is the code:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
  dataset = tf.data.Dataset.from_tensor_slices(series)
  dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
  dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
  dataset = dataset.shuffle(shuffle_buffer).map(lambda window: (window[:-1], window[-1]))
  dataset = dataset.batch(batch_size).prefetch(1)
  return dataset
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)
The function windowed_dataset split the univariate time series series into a matrix. Imagine,  we have a dataset as follows
dataset = tf.data.Dataset.range(10)
for val in dataset:
   print(val.numpy())
0
1
2
3
4
5
6
7
8
9
the windowed_dataset function convert series into windows with x features on the left and y labels on the right.
[2 3 4 5] [6]
[4 5 6 7] [8]
[3 4 5 6] [7]
[1 2 3 4] [5]
[5 6 7 8] [9]
[0 1 2 3] [4]
In the next step, we implement the neural network model on the training dataset as follows:
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"),
    tf.keras.layers.Dense(10, activation="relu"), 
    tf.keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-6, momentum=0.9))
model.fit(dataset,epochs=100,verbose=0)
Up to here, i am fine with the code. However, I am struggling to understand the out of sample forecasting shown below:
forecast = []
for time in range(len(series) - window_size):  
  forecast.append(model.predict(series[time:time + window_size][np.newaxis]))
forecast = forecast[split_time-window_size:]
Can someone please explain to me why are we using a loop here for time in range(len(series) - window_size) ? why not simply do model.predict(dataset_validation) for the validation part and model.predict(dataset) for the training part ?
I don't understand the need for the for loop because this is not a rolling forecast we are not re-training the model. Can someone please explain to me?
While i understand why the data science community structure the dataset this way, i personally find it a lot clearer when we split the X and y and do the model.fit as follows model.fit(X,y,epochs=100,verbose=0) and the predict as as follows model.predict(X)
The for loop is returning the predictions in order, whereas if you call model.predict(dataset_validation) you'll get the predictions in a shuffled order (assumed you shuffled the dataset).
As for the point of using datasets - it can just help with code organization. There is no need to ever use one if you don't want to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With