Training and Loss not changing in Keras CNN model

Question

I am running a CNN for left and right shoeprint classfication. I have 190,000 training images and I use 10% of it for validation. My model is setup as shown below. I get the paths of all the images, read them in and resize them. I normalize the image, and then fit it to the model. My issue is that I have stuck at a training accuracy of 62.5% and a loss of around 0.6615-0.6619. Is there something wrong that I am doing? How can I stop this from happening?

Just some interesting points to note:

I first tested this on 10 images I was having the same issue but changing the optimizer to adam and batch size to 4 worked.
I then tested on more and more images, but each time I would need to change the batch size to get improvements in the accuracy and loss. With 10,000 images I had to use a batch size of 500 and optimizer rmsprop. However, the accuracy and loss only really began to change after epoch 10.
I am now training on 190,000 images and I cannot increase the batch size as my GPU is at is max.

    imageWidth = 50
    imageHeight = 150
    
    def get_filepaths(directory):
        file_paths = []
        for filename in files:
            filepath = os.path.join(root, filename)
            file_paths.append(filepath) # Add it to the list.
        return file_paths
    
    def cleanUpPaths(fullFilePaths):
        cleanPaths = []
        for f in fullFilePaths:
            if f.endswith(".png"):
                cleanPaths.append(f)
        return cleanPaths
    
    def getTrainData(paths):
        trainData = []
        for i in xrange(1,190000,2):
            im = image.imread(paths[i])
            im = image.imresize(im, (150,50))
            im = (im-255)/float(255)
            trainData.append(im)
        trainData = np.asarray(trainData)
        right = np.zeros(47500)
        left = np.ones(47500)
        trainLabels = np.concatenate((left, right))
        trainLabels = np_utils.to_categorical(trainLabels)
        return (trainData, trainLabels)

    #create the convnet
    model = Sequential()
    
    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(imageWidth,imageHeight,1),strides=1))#32
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(64, (3, 3), activation='relu',strides=1))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(1, 3)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(64, (1, 2), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 1)))
    model.add(Dropout(0.25))
    
    model.add(Flatten())
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2, activation='softmax'))
    
    sgd = SGD(lr=0.01)
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
    
    #prepare the training data*/
    
    trainPaths = get_filepaths("better1/train")
    trainPaths = cleanUpPaths(trainPaths)
    (trainData, trainLabels) = getTrainData(trainPaths)
    trainData = np.reshape(trainData,(95000,imageWidth,imageHeight,1)).astype('float32')
    trainData = (trainData-255)/float(255)
    
    #train the convnet***
    model.fit(trainData, trainLabels, batch_size=500, epochs=50, validation_split=0.2)
    
    #/save the model and weights*/
    model.save('myConvnet_model5.h5');
    model.save_weights('myConvnet_weights5.h5');

tangerine · Accepted Answer

You can try to add a BatchNornmalization() layer after MaxPooling2D(). It works for me.

DBCerigo · Answer

I've had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.

Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*

Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it's predicting all one class.

Fixes/Checks (in somewhat of an order):

Double Check Model Architecture: use model.summary(), inspect the model.
Check Data Labels: make sure the labelling of your train data hasn't got mixed up somewhere in the preprocessing etc. (it happens!)
Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)

If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I'll try to update the list.

**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained "enough"

*Other names for the issue to help searches get here... keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output

Training and Loss not changing in Keras CNN model

Tags:

tensorflow

deep-learning

keras

conv-neural-network

TriniPhantom

2 Answers

tangerine

DBCerigo

Recent Activity

Donate For Us

Training and Loss not changing in Keras CNN model

Tags:

tensorflow

deep-learning

keras

conv-neural-network

TriniPhantom

2 Answers

tangerine

DBCerigo

Related questions

Recent Activity

Donate For Us