Do convolutional neural networks suffer from the vanishing gradient?

Tags:

I think I read somewhere that convolutional neural networks do not suffer from the vanishing gradient problem as much as standard sigmoid neural networks with increasing number of layers. But I have not been able to find a 'why'.

Does it truly not suffer from the problem or am I wrong and it depends on the activation function? [I have been using Rectified Linear Units, so I have never tested the Sigmoid Units for Convolutional Neural Networks]

245

asked Mar 09 '15 23:03

Roy

1 Answers

Convolutional neural networks (like standard sigmoid neural networks) do suffer from the vanishing gradient problem. The most recommended approaches to overcome the vanishing gradient problem are:

Layerwise pre-training
Choice of the activation function

You may see that the state-of-the-art deep neural network for computer vision problem (like the ImageNet winners) have used convolutional layers as the first few layers of the their network, but it is not the key for solving the vanishing gradient. The key is usually training the network greedily layer by layer. Using convolutional layers have several other important benefits of course. Especially in vision problems when the input size is large (the pixels of an image), using convolutional layers for the first layers are recommended because they have fewer parameters than fully-connected layers and you don't end up with billions of parameters for the first layer (which will make your network prone to overfitting).

However, it has been shown (like this paper) for several tasks that using Rectified linear units alleviates the problem of vanishing gradients (as oppose to conventional sigmoid functions).

answered Oct 02 '22 22:10

Amin Suzani

Related questions
                            
                                Errors encountered in partial_fit in scikit learn
                            
                                Distinction between linear and non linear regression?
                            
                                What are hidden units in individual LSTM cells?
                            
                                Benefits of TDD in machine learning
                            
                                Overfitting after first epoch
                            
                                Simple Neural Network with backpropagation in Swift
                            
                                how to detect language spoken in google cloud platform machine learning speech api
                            
                                Accessing gradient values of keras model outputs with respect to inputs
                            
                                Why does get_weights return an empty list?
                            
                                Why is naïve Bayes generative?
                            
                                Tensorflow: stack all row pairs from a tensor
                            
                                Backpropagation algorithm giving bad results
                            
                                Keras predict() returns a better accuracy than evaluate()
                            
                                100% classifier accuracy after using train_test_split
                            
                                OCR for Devanagari (Hindi / Marathi / Sanskrit)
                            
                                Neural Network size for Animation system
                            
                                scikit learn: desired amount of Best Features (k) not selected
                            
                                Matrix factorization for collaborative filtering - new users and items?
                            
                                How to normalize an image color?
                            
                                Unseen nominal values in weka

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do convolutional neural networks suffer from the vanishing gradient?

Tags:

machine-learning

neural-network

classification

conv-neural-network

Roy

People also ask

1 Answers

Amin Suzani

Recent Activity

Donate For Us