I am training the last layer of VGG16 in Keras. My models looks like:
map_characters1 = {0: 'No Pneumonia', 1: 'Yes Pneumonia'}
class_weight1 = class_weight.compute_class_weight('balanced', np.unique(y_train), y_train)
weight_path1 = './imagenet_models/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
pretrained_model_1 = VGG16(weights = 'imagenet', include_top=False, input_shape=(200, 200, 3))
optimizer1 = keras.optimizers.Adam(lr=0.0001)
def pretrainedNetwork(xtrain,ytrain,xtest,ytest,pretrainedmodel,pretrainedweights,classweight,numclasses,numepochs,optimizer,labels):
base_model = pretrained_model_1 # Topless
# Add top layer
x = base_model.output
x = Flatten()(x)
predictions = Dense(numclasses, activation='relu')(x)
model = Model(inputs=base_model.input, outputs=predictions)
# Train top layer
for layer in base_model.layers:
layer.trainable = False
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
callbacks_list = [keras.callbacks.EarlyStopping(monitor='val_acc', patience=3, verbose=1)]
model.summary()
# Fit model
history = model.fit(xtrain,ytrain, epochs=numepochs, class_weight=classweight, validation_data=(xtest,ytest), verbose=1,callbacks = [MetricsCheckpoint('logs')])
# Evaluate model
score = model.evaluate(xtest,ytest, verbose=0)
print('\nKeras CNN - accuracy:', score[1], '\n')
return model
The training looks fine at the beginning: loss decreases, accuracy increases. But then the loss becomes nan and accuracy becomes 0.5 - as a random guess.
The model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 200, 200, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 200, 200, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 200, 200, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 100, 100, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 100, 100, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 100, 100, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 50, 50, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 50, 50, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 50, 50, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 50, 50, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 25, 25, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 25, 25, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 25, 25, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 25, 25, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 12, 12, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 12, 12, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 12, 12, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 12, 12, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 6, 6, 512) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 18432) 0
_________________________________________________________________
dense_2 (Dense) (None, 2) 36866
=================================================================
Total params: 14,751,554
Trainable params: 36,866
Non-trainable params: 14,714,688
Training output:
Train on 2682 samples, validate on 468 samples
Epoch 1/6
2682/2682 [==============================] - 621s 232ms/step - loss: 1.5150 - acc: 0.7662 - val_loss: 0.4117 - val_acc: 0.8526
Epoch 2/6
2682/2682 [==============================] - 615s 229ms/step - loss: 0.2535 - acc: 0.9459 - val_loss: 1.7812 - val_acc: 0.7009
Epoch 3/6
2682/2682 [==============================] - 621s 232ms/step - loss: nan - acc: 0.7468 - val_loss: nan - val_acc: 0.5000
Epoch 4/6
2682/2682 [==============================] - 644s 240ms/step - loss: nan - acc: 0.5000 - val_loss: nan - val_acc: 0.5000
Epoch 5/6
2682/2682 [==============================] - 616s 230ms/step - loss: nan - acc: 0.5000 - val_loss: nan - val_acc: 0.5000
Where could I find the problem? What is happening with loss?
You have an exploding gradient. Simplifying, consider a convex optimization by gradient descent. The goal of the Neural Network is to optimize weights in a way the derivative of loss become zero, at the bottom (green) of the following figure:


The exploding gradient is where the gradient becomes almost parallel to the Sum of Squared Errors axis, generating nans.
There are some fixes for this, as Batch Normalization, weight initialization, the use of ReLU activation functions and a smaller learning rate. For vanishing gradients in LSTM, even the optimizer matters.
If your learning rate is not small enough, the training may become a zig zag in the gradient, missing the local minimum:

The problem was that I used activation='relu' in the prediction layer. I changed it to 'softmax' and now it works!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With