I am training a LSTM autoencoder, but the loss function randomly shoots up as in the picture below: I tried multiple to things to prevent this, adjusting the batch size, adjusting the number of neurons in my layers, but nothing seems to help. I checked my input data to see if it contains null / infinity values, but it doesn't, it is normalized also. Here is my code for reference:
model = Sequential()
model.add(Masking(mask_value=0, input_shape=(430, 3)))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2, activation='relu'))
model.add(RepeatVector(430))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(3)))
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
context_paths = loadFile()
X_train, X_test = train_test_split(context_paths, test_size=0.20)
history = model.fit(X_train, X_train, epochs=1, batch_size=4, verbose=1, validation_data=(X_test, X_test))
The loss function explodes at random points in time, sometimes sooner, sometimes later. I read this thread about possible problems, but at this point after trying multiple things I am not sure what to do to prevent the loss function from skyrocketing at random. Any advice is appreciated. Other than this I can see that my accuracy is not increasing very much, so the problems may be interconnected.
Two main points:
1st point As highlighted by Daniel Möller: Don't use 'relu' for LSTM, leave the standard activation which is 'tanh'.
2nd point: One way to fix the exploding gradient is to use clipnorm or clipvalue for the optimizer
Try something like this for the last two lines
For clipnorm:
opt = tf.keras.optimizers.Adam(clipnorm=1.0)
For clipvalue:
opt = tf.keras.optimizers.Adam(clipvalue=0.5)
See this post for help (previous version of TF): How to apply gradient clipping in TensorFlow?
And this post for general explanation: https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/
Two main issues:
'relu'
for LSTM
, leave the standard activation which is 'tanh'
. Because LSTM's are "recurrent", it's very easy for them to accumulate growing or decreasing of values to a point of making the numbers useless. X_train
and X_test
. Make sure they're not too big. Something between -4 and +4 is sort of good. You should consider normalizing your data if it's not normalized yet.Notice that "accuracy" doesn't make any sense for problems that are not classificatino. (I notice your final activation is "linear", so you're not doing classification, right?)
Finally, if the two hints above don't work. Check whether you have an example that is all zeros, this might be creating a "full mask" sequence, and this "might" (I don't know) cause a bug.
(X_train == 0).all(axis=[1,2]).any() #should be false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With