I have a timeseries dataset and I am trying to train a network so that it overfits (obviously, that's just the first step, I will then battle the overfitting).
The network has two layers: LSTM (32 neurons) and Dense (1 neuron, no activation)
Training/model has these parameters:
epochs: 20
, steps_per_epoch: 100
, loss: "mse"
, optimizer: "rmsprop"
.
TimeseriesGenerator
produces the input series with: length: 1
, sampling_rate: 1
, batch_size: 1
.
I would expect the network would just memorize such a small dataset (I have tried even much more complicated network to no avail) and the loss on training dataset would be pretty much zero. It is not and when I visualize the results on the training set like this:
y_pred = model.predict_generator(gen)
plot_points = 40
epochs = range(1, plot_points + 1)
pred_points = numpy.resize(y_pred[:plot_points], (plot_points,))
target_points = gen.targets[:plot_points]
plt.plot(epochs, pred_points, 'b', label='Predictions')
plt.plot(epochs, target_points, 'r', label='Targets')
plt.legend()
plt.show()
I get:
The predictions have somewhat smaller amplitude but are precisely inverse to the targets. Btw. this is not memorized, they are inversed even for the test dataset which the algorithm hasn't trained on at all.It appears that instead of memorizing the dataset, my network just learned to negate the input value and slightly scale it down. Any idea why this is happening? It doesn't seem like the solution the optimizer should have converged to (loss is pretty big).
EDIT (some relevant parts of my code):
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x,
y,
length=1,
sampling_rate=1,
batch_size=1,
shuffle=False
)
model = Sequential()
model.add(LSTM(32, input_shape=(1, 1), return_sequences=False))
model.add(Dense(1, input_shape=(1, 1)))
model.compile(
loss="mse",
optimizer="rmsprop",
metrics=[keras.metrics.mean_squared_error]
)
history = model.fit_generator(
train_gen,
epochs=20,
steps_per_epoch=100
)
EDIT (different, randomly generated dataset):
I had to increase number of LSTM neurons to 256, with the previous setting (32 neurons), the blue line was pretty much flat. However, with the increase the same pattern arises - inverse predictions with somewhat smaller amplitude.
EDIT (targets shifted by +1):
Shifting the targets by one compared to predictions doesn't produce much better fit. Notice the highlighted parts where the graph isn't just alternating, it's more apparent there.
EDIT (increased length to 2 ... TimeseriesGenerator(length=2, ...)
):
With length=2
the predictions stop tracking the targets so closely but the overall pattern of inversion still stands.
You say that your network "just learned to negate the input value and slightly scale it down". I don't think so. It is very likely that all you are seeing is the network performing poorly, and just predicting the previous value (but scaled as you say). This issue is something I've seen again and again. Here is another example, and another, of this issue. Also, remember it is very easy to fool yourself by shifting the data by one. It is very likely you are simply shifting the poor prediction back in time and getting an overlap.
EDIT: After author's comments I do not believe this is the correct answer but I will keep it posted for posterity.
Great question and the answer is due to how the Time_generator works! Apparently instead of grabbing x,y pairs with the same index (e.g input x[0]
to output target y[0]
) it grabs target with offset 1 (so x[0]
to y[1]
).
Thus plotting y with offset 1 will produce the desired fit.
Code to simulate:
import keras
import matplotlib.pyplot as plt
x=np.random.uniform(0,10,size=41).reshape(-1,1)
x[::2]*=-1
y=x[1:]
x=x[:-1]
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x,
y,
length=1,
sampling_rate=1,
batch_size=1,
shuffle=False
)
model = keras.models.Sequential()
model.add(keras.layers.LSTM(100, input_shape=(1, 1), return_sequences=False))
model.add(keras.layers.Dense(1))
model.compile(
loss="mse",
optimizer="rmsprop",
metrics=[keras.metrics.mean_squared_error]
)
model.optimizer.lr/=.1
history = model.fit_generator(
train_gen,
epochs=20,
steps_per_epoch=100
)
Proper plotting:
y_pred = model.predict_generator(train_gen)
plot_points = 39
epochs = range(1, plot_points + 1)
pred_points = np.resize(y_pred[:plot_points], (plot_points,))
target_points = train_gen.targets[1:plot_points+1] #NOTICE DIFFERENT INDEXING HERE
plt.plot(epochs, pred_points, 'b', label='Predictions')
plt.plot(epochs, target_points, 'r', label='Targets')
plt.legend()
plt.show()
Output, Notice how the fit is no longer inverted and is mostly very accurate:
This is how it looks when the offset is incorrect:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With