Explosion in loss function, LSTM autoencoder

Question

I am training a LSTM autoencoder, but the loss function randomly shoots up as in the picture below: screenshot of explosion in loss function I tried multiple to things to prevent this, adjusting the batch size, adjusting the number of neurons in my layers, but nothing seems to help. I checked my input data to see if it contains null / infinity values, but it doesn't, it is normalized also. Here is my code for reference:

model = Sequential()
model.add(Masking(mask_value=0, input_shape=(430, 3)))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2, activation='relu'))
model.add(RepeatVector(430))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(3)))
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])

context_paths = loadFile()
X_train, X_test = train_test_split(context_paths, test_size=0.20)

history = model.fit(X_train, X_train, epochs=1, batch_size=4, verbose=1, validation_data=(X_test, X_test))

The loss function explodes at random points in time, sometimes sooner, sometimes later. I read this thread about possible problems, but at this point after trying multiple things I am not sure what to do to prevent the loss function from skyrocketing at random. Any advice is appreciated. Other than this I can see that my accuracy is not increasing very much, so the problems may be interconnected.

Dr. H. Lecter · Accepted Answer

Two main points:

1st point As highlighted by Daniel Möller: Don't use 'relu' for LSTM, leave the standard activation which is 'tanh'.

2nd point: One way to fix the exploding gradient is to use clipnorm or clipvalue for the optimizer

Try something like this for the last two lines

For clipnorm:

opt = tf.keras.optimizers.Adam(clipnorm=1.0)

For clipvalue:

opt = tf.keras.optimizers.Adam(clipvalue=0.5)

See this post for help (previous version of TF): How to apply gradient clipping in TensorFlow?

And this post for general explanation: https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/

Daniel Möller · Answer

Two main issues:

Don't use 'relu' for LSTM, leave the standard activation which is 'tanh'. Because LSTM's are "recurrent", it's very easy for them to accumulate growing or decreasing of values to a point of making the numbers useless.
Check the range of your data X_train and X_test. Make sure they're not too big. Something between -4 and +4 is sort of good. You should consider normalizing your data if it's not normalized yet.

Notice that "accuracy" doesn't make any sense for problems that are not classificatino. (I notice your final activation is "linear", so you're not doing classification, right?)

Finally, if the two hints above don't work. Check whether you have an example that is all zeros, this might be creating a "full mask" sequence, and this "might" (I don't know) cause a bug.

(X_train == 0).all(axis=[1,2]).any() #should be false

Explosion in loss function, LSTM autoencoder

Tags:

python

keras

lstm

loss-function

autoencoder

Michael Kročka

2 Answers

Dr. H. Lecter

Daniel Möller

Recent Activity

Donate For Us

Explosion in loss function, LSTM autoencoder

Tags:

python

keras

lstm

loss-function

autoencoder

Michael Kročka

2 Answers

Dr. H. Lecter

Daniel Möller

Related questions

Recent Activity

Donate For Us