I am using an adapted LeNet model in keras to make a binary classification. I have about 250,000 training samples with ratio 60/40. My model is training very well. The first epoch the accuracy reaches 97 percent with a loss of 0.07. After 10 epochs the accuracy is over 99 percent with a loss of 0.01. I am using a CheckPointer to save my models when they improve.
Around the 11th epoch the accuracy drops to around 55 percent with a loss of around 6. How, could this be possible? Is it because the model cannot be more accurate and it tries to find better weights but completely fails to do so?
My model is an adaptation on the LeNet model:
lenet_model = models.Sequential()
lenet_model.add(Convolution2D(filters=filt_size, kernel_size=(kern_size, kern_size), padding='valid',\
input_shape=input_shape))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Convolution2D(filters=64, kernel_size=(kern_size, kern_size), padding='valid'))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Convolution2D(filters=128, kernel_size=(kern_size, kern_size), padding='valid'))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Flatten())
lenet_model.add(Dense(1024, kernel_initializer='uniform'))
lenet_model.add(Activation('relu'))
lenet_model.add(Dense(512, kernel_initializer='uniform'))
lenet_model.add(Activation('relu'))
lenet_model.add(Dropout(0.2))
lenet_model.add(Dense(n_classes, kernel_initializer='uniform'))
lenet_model.add(Activation('softmax'))
lenet_model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
The problem lied in applying a binary_crossentropy
loss whereas in this case categorical_crossentropy
should be applied. Another approach is to leave binary_crossentropy
loss but to change output to have dim=1
and activation to sigmoid
. The weird behaviour comes from the fact that with binary_crossentropy
a multiclass binary classification (with two classes) is actually solved whereas your task is a single class binary classification.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With