Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Training and Loss not changing in Keras CNN model

I am running a CNN for left and right shoeprint classfication. I have 190,000 training images and I use 10% of it for validation. My model is setup as shown below. I get the paths of all the images, read them in and resize them. I normalize the image, and then fit it to the model. My issue is that I have stuck at a training accuracy of 62.5% and a loss of around 0.6615-0.6619. Is there something wrong that I am doing? How can I stop this from happening?

Just some interesting points to note:

  1. I first tested this on 10 images I was having the same issue but changing the optimizer to adam and batch size to 4 worked.

  2. I then tested on more and more images, but each time I would need to change the batch size to get improvements in the accuracy and loss. With 10,000 images I had to use a batch size of 500 and optimizer rmsprop. However, the accuracy and loss only really began to change after epoch 10.

  3. I am now training on 190,000 images and I cannot increase the batch size as my GPU is at is max.

    imageWidth = 50
    imageHeight = 150
    
    def get_filepaths(directory):
        file_paths = []
        for filename in files:
            filepath = os.path.join(root, filename)
            file_paths.append(filepath) # Add it to the list.
        return file_paths
    
    def cleanUpPaths(fullFilePaths):
        cleanPaths = []
        for f in fullFilePaths:
            if f.endswith(".png"):
                cleanPaths.append(f)
        return cleanPaths
    
    def getTrainData(paths):
        trainData = []
        for i in xrange(1,190000,2):
            im = image.imread(paths[i])
            im = image.imresize(im, (150,50))
            im = (im-255)/float(255)
            trainData.append(im)
        trainData = np.asarray(trainData)
        right = np.zeros(47500)
        left = np.ones(47500)
        trainLabels = np.concatenate((left, right))
        trainLabels = np_utils.to_categorical(trainLabels)
        return (trainData, trainLabels)

    #create the convnet
    model = Sequential()
    
    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(imageWidth,imageHeight,1),strides=1))#32
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(64, (3, 3), activation='relu',strides=1))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(1, 3)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(64, (1, 2), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 1)))
    model.add(Dropout(0.25))
    
    model.add(Flatten())
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2, activation='softmax'))
    
    sgd = SGD(lr=0.01)
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
    
    #prepare the training data*/
    
    trainPaths = get_filepaths("better1/train")
    trainPaths = cleanUpPaths(trainPaths)
    (trainData, trainLabels) = getTrainData(trainPaths)
    trainData = np.reshape(trainData,(95000,imageWidth,imageHeight,1)).astype('float32')
    trainData = (trainData-255)/float(255)
    
    #train the convnet***
    model.fit(trainData, trainLabels, batch_size=500, epochs=50, validation_split=0.2)
    
    #/save the model and weights*/
    model.save('myConvnet_model5.h5');
    model.save_weights('myConvnet_weights5.h5');
like image 905
TriniPhantom Avatar asked Apr 28 '17 07:04

TriniPhantom


2 Answers

You can try to add a BatchNornmalization() layer after MaxPooling2D(). It works for me.

like image 70
tangerine Avatar answered Oct 14 '22 23:10

tangerine


I've had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.

Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*

Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it's predicting all one class.

Fixes/Checks (in somewhat of an order):

  • Double Check Model Architecture: use model.summary(), inspect the model.
  • Check Data Labels: make sure the labelling of your train data hasn't got mixed up somewhere in the preprocessing etc. (it happens!)
  • Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
  • Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
  • Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)

If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I'll try to update the list.

**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained "enough"

*Other names for the issue to help searches get here... keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output

like image 43
DBCerigo Avatar answered Oct 15 '22 00:10

DBCerigo