Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validation Loss Much Higher Than Training Loss

I'm very new to deep learning models, and trying to train a multiple time series model using LSTM with Keras Sequential. There are 25 observations per year for 50 years = 1250 samples, so not sure if this is even possible to use LSTM for such small data. However, I have thousands of feature variables, not including time lags. I'm trying to predict a sequence of the next 25 time steps of data. The data is normalized between 0 and 1. My problem is that, despite trying many obvious adjustments, I cannot get the LSTM validation loss anywhere close to the training loss (overfitting dramatically, I think).

I have tried adjusting number of nodes per hidden layer (25-375), number of hidden layers (1-3), dropout (0.2-0.8), batch_size (25-375), and train/ test split (90%:10% - 50%-50%). Nothing really makes much of a difference on the validation loss/ training loss disparity.

# SPLIT INTO TRAIN AND TEST SETS
# 25 observations per year; Allocate 5 years (2014-2018) for Testing
n_test = 5 * 25
test = values[:n_test, :]
train = values[n_test:, :]

# split into input and outputs
train_X, train_y = train[:, :-25], train[:, -25:]
test_X, test_y = test[:, :-25], test[:, -25:]

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 5, newdf.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 5, newdf.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)


# design network
model = Sequential()
model.add(Masking(mask_value=-99, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LSTM(375, return_sequences=True))
model.add(Dropout(0.8))
model.add(LSTM(125, return_sequences=True))
model.add(Dropout(0.8))
model.add(LSTM(25))
model.add(Dense(25))
model.compile(loss='mse', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=20, batch_size=25, validation_data=(test_X, test_y), verbose=2, shuffle=False)

Epoch 19/20

  • 14s - loss: 0.0512 - val_loss: 188.9568

Epoch 20/20

  • 14s - loss: 0.0510 - val_loss: 188.9537

link to plot of Val Loss / Train Loss

I assume I must be doing something obvious wrong, but can't realize it since I'm a newbie. I am hoping to either get some useful validation loss achieved (compared to training), or know that my data observations are simply not large enough for useful LSTM modeling. Any help or suggestions is much appreciated, thanks!

like image 753
Michael Bolton Avatar asked Mar 31 '19 19:03

Michael Bolton


People also ask

Why my validation loss is higher than training loss?

At times, the validation loss is greater than the training loss. This may indicate that the model is underfitting. Underfitting occurs when the model is unable to accurately model the training data, and hence generates large errors.

What to do if validation accuracy is higher than training accuracy?

The point here is that you are working on a large training Set and a too small validation/test set so that this latter is way too easy to computed. Try data augmentation and other technique to get your dataset bigger!

Should training and validation loss be the same?

Validation loss is used to ascertain performance on an entire epoch after the training loss has been calculated and weight updates have been completed. The loss function should be the same and the purpose is to make sense of models performance on unseen data – data that is not used to update the weights.

What is the relationship between training Loss and Validation loss?

The training loss indicates how well the model is fitting the training data, while the validation loss indicates how well the model fits new data.


1 Answers

Overfitting

In general, if you're seeing much higher validation loss than training loss, then it's a sign that your model is overfitting - it learns "superstitions" i.e. patterns that accidentally happened to be true in your training data but don't have a basis in reality, and thus aren't true in your validation data.

It's generally a sign that you have a "too powerful" model, too many parameters that are capable of memorizing the limited amount of training data. In your particular model you're trying to learn almost a million parameters (try printing model.summary()) from a thousand datapoints - that's not reasonable, learning can extract/compress information from data, not create it out of thin air.

What's the expected result?

The first question you should ask (and answer!) before building a model is about the expected accuracy. You should have a reasonable lower bound (what's a trivial baseline? For time series prediction, e.g. linear regression might be one) and an upper bound (what could an expert human predict given the same input data and nothing else?).

Much depends on the nature of the problem. You really have to ask, is this information sufficient to get a good answer? For many real life time problems with time series prediction, the answer is no - the future state of such a system depends on many variables that can't be determined by simply looking at historical measurements - to reasonably predict the next value, you need to bring in lots of external data other than the historical prices. There's a classic quote by Tukey: "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data."

like image 54
Peteris Avatar answered Nov 15 '22 10:11

Peteris