How to prevent a lazy Convolutional Neural Network? I end with a ‘lazy CNN’ after training it with KERAS. Whatever the input is, the output is constant. What do you think the problem is?
I try to repeat an experiment of NVIDIA’s End to End Learning for Self-Driving Cars the paper. Absolutely, I do not have a real car but a Udacity’s simulator . The simulator generates figures about the foreground of a car.
A CNN receives the figure, and it gives the steering angle to keep the car in the track. The rule of the game is to keep the simulated car runs in the track safely. It is not very difficult.
The strange thing is sometimes I end with a lazy CNN after training it with KERAS, which gives constant steering angles. The simulated car will go off the trick, but the output of the CNN has no change. Especially the layer gets deeper, e.g. the CNN in the paper.
If I use a CNN like this, I can get a useful model after training.
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(36, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(48, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(50))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.add(Dense(1))
But, if I use a deeper CNN, I have more chance to receive a lazy CNN. Specifically, if I use a CNN which likes NVIDIA’s, I almost receive a lazy CNN after every training.
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(36, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(48, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, strides=(1, 1)))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, strides=(1, 1)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(1164))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(50))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.add(Dense(1))
I use ‘relu’ for convolution layers, and the activation function for the fully connected layer is ‘sigmoid’. I try to change the activation function, but there is no effect.
There is my analysis. I do not agree with a bug in my program because I can successfully drive the car with same codes and a simpler CNN. I think the reason is the simulator or the structure of the neural network. In a real self-driving car, the training signal, that is the steering angle, should contain noise; therefor, the driver never holds the wheel still in the real road. But in the simulator, the training signal is very clean. Almost 60% of the steering angle is zero. The optimizer can easily do the job by turning the output of CNN close to the zero. It seems the optimizer is lazy too. However, when we really want this CNN output something, it also gives zeros. So, I add small noise for these zero steering angles. The chance that I get a lazy CNN is smaller, but it is not disappearing.
What do you think about my analysis? Is there other strategy that I can use? I am wondering whether similar problems have been solved in the long history of CNN research.
resource:
The related files have been uploaded to GitHub. You can repeat the entire experiment with these files.
I can't run your model, because neither the question not the GitHub repo contains the data. That's why I am 90% sure of my answer.
But I think the main problem of your network is the sigmoid
activation function after dense layers. I assume, it will train well when there's just two of them, but four is too much.
Unfortunately, NVidia's End to End Learning for Self-Driving Cars paper doesn't specify it explicitly, but these days the default activation is no longer sigmoid
(as it once was), but relu
. See this discussion if you're interested why that is so. So the solution I'm proposing is try this model:
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(36, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(48, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation="relu"))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation="relu"))
model.add(Flatten())
model.add(Dense(1164, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(50, activation="relu"))
model.add(Dense(10, activation="relu"))
model.add(Dense(1))
It mimics the NVidia's network architecture and does not suffer from the vanishing gradients.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With