Keras RNN loss does not decrease over epoch

Tags:

I built a RNN using Keras. The RNN is used to solve a regression problem:

def RNN_keras(feat_num, timestep_num=100):
    model = Sequential()
    model.add(BatchNormalization(input_shape=(timestep_num, feat_num)))
    model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=512, activation='relu', return_sequences=True))
    model.add(BatchNormalization())  
    model.add(LSTM(output_dim=128, activation='relu', return_sequences=True))
    model.add(BatchNormalization())
    model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling

    rmsprop = RMSprop(lr=0.00001, rho=0.9, epsilon=1e-08)
    model.compile(loss='mean_squared_error',
                  optimizer=rmsprop,
                  metrics=['mean_squared_error'])
    return model

The whole process looks fine. But the loss stays the exact same over epochs.

61267 in the training set
6808 in the test set

Building training input vectors ...
888 unique feature names
The length of each vector will be 888
Using TensorFlow backend.

Build model...

# Each batch has 1280 examples
# The training data are shuffled at the beginning of each epoch.

****** Iterating over each batch of the training data ******
Epoch 1/3 : Batch 1/48 | loss = 11011073.000000 | root_mean_squared_error = 3318.232910
Epoch 1/3 : Batch 2/48 | loss = 620.271667 | root_mean_squared_error = 24.904161
Epoch 1/3 : Batch 3/48 | loss = 620.068665 | root_mean_squared_error = 24.900017
......
Epoch 1/3 : Batch 47/48 | loss = 618.046448 | root_mean_squared_error = 24.859678
Epoch 1/3 : Batch 48/48 | loss = 652.977051 | root_mean_squared_error = 25.552946
****** Epoch 1: RMSD(training) = 24.897174 

Epoch 2/3 : Batch 1/48 | loss = 607.372620 | root_mean_squared_error = 24.644049
Epoch 2/3 : Batch 2/48 | loss = 599.667786 | root_mean_squared_error = 24.487448
Epoch 2/3 : Batch 3/48 | loss = 621.368103 | root_mean_squared_error = 24.926300
......
Epoch 2/3 : Batch 47/48 | loss = 620.133667 | root_mean_squared_error = 24.901398
Epoch 2/3 : Batch 48/48 | loss = 639.971924 | root_mean_squared_error = 25.297264
****** Epoch 2: RMSD(training) = 24.897174 

Epoch 3/3 : Batch 1/48 | loss = 651.519836 | root_mean_squared_error = 25.523636
Epoch 3/3 : Batch 2/48 | loss = 673.582581 | root_mean_squared_error = 25.952084
Epoch 3/3 : Batch 3/48 | loss = 613.930054 | root_mean_squared_error = 24.776562
......
Epoch 3/3 : Batch 47/48 | loss = 624.460327 | root_mean_squared_error = 24.988203
Epoch 3/3 : Batch 48/48 | loss = 629.544250 | root_mean_squared_error = 25.090448
****** Epoch 3: RMSD(training) = 24.897174

I do NOT think it is normal. Do I miss something?

UPDATE: I find that all predictions are always zero after all epochs. This is the reason why all RMSDs are all the same because the predictions are all the same, i.e. 0. I checked the training y. It only contains just a few zeros. So it is not due to data imbalance.

So now I am thinking if it is because of the layers and activation that I am using.

664

asked Sep 03 '16 17:09

Munichong

2 Answers

Your RNN functions seems to be ok.

The speed of reduction in loss depends on optimizer and learning rate.

Any how you are using decay rate 0.9. try with bigger learning rate, any how it is going to decrease with 0.9 rate.

Try out other optimizers with different learning rates Other optimizers available with keras: https://keras.io/optimizers/

Many times, some optimizers work well on some data sets while some may fails.

129

answered Sep 28 '22 05:09

Pramod Patil

Have you tried changing activation function from relu to softmax?

Relu activation has the tendency to diverge. However, if initializing the weight with eigenmatrix may result in a better convergence.

answered Sep 28 '22 05:09

theblackcat102

Related questions
                            
                                Does the sigmoid function really matter in Logistic Regression?
                            
                                What are the advantages of using an autoencoder to build a set of filters versus a prebuilt set of gabor filters in relation to CNNs?
                            
                                how can I combine training set specific learned parameters with sklearn online (out-of-core) learning
                            
                                How to pre-process new instances for classification, so that the feature encoding is the same as the model with Scikit-learn?
                            
                                Object categories of pretrained imagenet model in caffe
                            
                                Spark MlLib linear regression (Linear least squares) giving random results
                            
                                Missing value error in the randomForest package of R
                            
                                Normalizing a list of restaurant dishes
                            
                                Neural Network Backpropagation implementation issues
                            
                                Tensorflow: List of Tensors for Cost
                            
                                How can I handle huge matrices?
                            
                                How to determine maximum batch size for a seq2seq tensorflow RNN training model
                            
                                Python keras how to change the size of input after convolution layer into lstm layer
                            
                                Function to determine a reasonable initial guess for scipy.optimize?
                            
                                Selecting the components showing the most variance in PCA
                            
                                How to use sklearn Pipeline with custom Features?
                            
                                Caffe sigmoid cross entropy loss
                            
                                How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?
                            
                                SVM - Difference between Energy vs Loss vs Regularization vs Cost function
                            
                                How to pass elegantly Sklearn's GridseachCV's best parameters to another model?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras RNN loss does not decrease over epoch

Tags:

machine-learning

neural-network

deep-learning

keras

recurrent-neural-network

Munichong

People also ask

2 Answers

Pramod Patil

theblackcat102

Recent Activity

Donate For Us