Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network flatlines after one epoch

I'm using keras to create a convolutional neural network to try to classify images into two distinct classes and for some reason after the first epoch the accuracy never changes.

After using Keras's to_categorical() my labels look like:

[[0.  1.]
[1.  0.]
[1.  0.]
[0.  1.]]

and the code for my model is:

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=[5, 5], strides=1, padding='same', activation='relu', input_shape=(imageSize, imageSize, 3)))
model.add(MaxPooling2D())
model.add(Conv2D(filters=64, kernel_size=[5, 5], strides=1, padding='same', activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(2))
sgd = SGD()  # Use stochastic gradient descent for now
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.summary()

counter = 0
# Train one cycle at a time so we can shuffle data inbetween
for x in range(trainingEpochs):

    counter += 1
    print()  # New line
    print('Epoch ' + str(counter))

    trainingImages, trainingLabels = shuffle(trainingImages, trainingLabels, random_state=0)  # Shuffle both sets in unison

    model.fit(x=trainingImages, y=trainingLabels, batch_size=32, epochs=1, verbose=2)

This code results in the output:

Epoch 1
36s - loss: 5.0770 - acc: 0.3554

Epoch 2
36s - loss: 4.9421 - acc: 0.3066

Epoch 3
36s - loss: 4.9421 - acc: 0.3066

Epoch 4
36s - loss: 4.9421 - acc: 0.3066

So far I've tried changing the batch size, using binary_crossentropy, changing the shuffling method, changing the convolution parameters, using black and white photos instead of RGB, using different size pictures, using ADAM instead of SGD, and using a lower learning rate for SGD but none of those have fixed the problem. I'm at a loss, does anyone have any ideas?

Edit: trainingImages has a shape of (287, 256, 256, 3) if that matters at all.

like image 706
Pecans Avatar asked Dec 11 '22 11:12

Pecans


1 Answers

The symptom is that the training loss stops being improved relatively early. Suppose that your problem is learnable at all, there are many reasons for the for this behavior. These are at the top of my head:

  1. Improper preprocessing of input:

Neural network prefers input with zero mean. E.g., if the input is all positive, it will restrict the weights to be updated in the same direction, which may not be desirable (https://youtu.be/gYpoJMlgyXA).

Therefore, you may want to subtract the mean from all the images (e.g., subtract 127.5 from each of the 3 channels). Scaling to make unit standard deviation in each channel may also be helpful.

  1. Generalization ability of the network:

The network is not complicated or deep enough for the task.

This is very easy to check. You can train the network on just a few images (says from 3 to 10). The network should be able to overfit the data and drives the loss to almost 0. If it is not the case, you may have to add more layers such as using more than 1 Dense layer.

Another good idea is to used pre-trained weights (in applications of Keras documentation). You may adjust the Dense layers at the top to fit with your problem.

  1. Improper weight initialization.

Improper weight initialization can prevent the network from converging (https://youtu.be/gYpoJMlgyXA, the same video as before).

For the ReLU activation, you may want to use He initialization instead of the default Glorot initialiation. I find that this may be necessary sometimes but not always.

Lastly, you can use debugging tools for Keras such as keras-vis, keplr-io, deep-viz-keras. They are very useful to open the blackbox of convolutional networks.

like image 147
Ngoc Anh Huynh Avatar answered Dec 15 '22 15:12

Ngoc Anh Huynh