Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do convolutional neural networks suffer from the vanishing gradient?

I think I read somewhere that convolutional neural networks do not suffer from the vanishing gradient problem as much as standard sigmoid neural networks with increasing number of layers. But I have not been able to find a 'why'.

Does it truly not suffer from the problem or am I wrong and it depends on the activation function? [I have been using Rectified Linear Units, so I have never tested the Sigmoid Units for Convolutional Neural Networks]

like image 245
Roy Avatar asked Mar 09 '15 23:03

Roy


People also ask

Do convolutional neural networks use gradient descent?

This paper introduces the Convolutional Neural Network algorithm, which is very popular in the field of machine learning and image recognition. It combines the Back Propagation mechanism and the Gradient Descent method to discuss its basic structure and function.

Does sigmoid suffer from vanishing gradient?

Vanishing gradients are common when the Sigmoid or Tanh activation functions are used in the hidden layer units. When the inputs grow extremely small or extremely large, the sigmoid function saturates at 0 and 1 while the tanh function saturates at -1 and 1.

Which is a method used to overcome vanishing gradient?

Residual networks One of the newest and most effective ways to resolve the vanishing gradient problem is with residual neural networks, or ResNets (not to be confused with recurrent neural networks). ResNets refer to neural networks where skip connections or residual connections are part of the network architecture.

Why is the vanishing gradient a problem?

So while using the function we can say that a large change in the input space will be very small in the output. The vanishing gradients problem is one example of the unstable behaviour of a multilayer neural network. Networks are unable to backpropagate the gradient information to the input layers of the model.


1 Answers

Convolutional neural networks (like standard sigmoid neural networks) do suffer from the vanishing gradient problem. The most recommended approaches to overcome the vanishing gradient problem are:

  • Layerwise pre-training
  • Choice of the activation function

You may see that the state-of-the-art deep neural network for computer vision problem (like the ImageNet winners) have used convolutional layers as the first few layers of the their network, but it is not the key for solving the vanishing gradient. The key is usually training the network greedily layer by layer. Using convolutional layers have several other important benefits of course. Especially in vision problems when the input size is large (the pixels of an image), using convolutional layers for the first layers are recommended because they have fewer parameters than fully-connected layers and you don't end up with billions of parameters for the first layer (which will make your network prone to overfitting).

However, it has been shown (like this paper) for several tasks that using Rectified linear units alleviates the problem of vanishing gradients (as oppose to conventional sigmoid functions).

like image 98
Amin Suzani Avatar answered Oct 02 '22 22:10

Amin Suzani