How does pre-training improve classification in neural networks?

Tags:

Many of the papers I have read so far have this mentioned "pre-training network could improve computational efficiency in terms of back-propagating errors", and could be achieved using RBMs or Autoencoders.

If I have understood correctly, AutoEncoders work by learning the identity function, and if it has hidden units less than the size of input data, then it also does compression, BUT what does this even have anything to do with improving computational efficiency in propagating error signal backwards? Is it because the weights of the pre trained hidden units does not diverge much from its initial values?
Assuming data scientists who are reading this would by theirselves know already that AutoEncoders take inputs as target values since they are learning identity function, which is regarded as unsupervised learning, but can such method be applied to Convolutional Neural Networks for which the first hidden layer is feature map? Each feature map is created by convolving a learned kernel with a receptive field in the image. This learned kernel, how could this be obtained by pre-training (unsupervised fashion)?

557

asked Dec 29 '15 16:12

VM_AI

1 Answers

One thing to note is that autoencoders try to learn the non-trivial identify function, not the identify function itself. Otherwise they wouldn't have been useful at all. Well the pre-training helps moving the weight vectors towards a good starting point on the error surface. Then the backpropagation algorithm, which is basically doing gradient descent, is used improve upon those weights. Note that gradient descent gets stuck in the closes local minima.

enter image description here

[Ignore the term Global Minima in the image posted and think of it as another, better, local minima]

Intuitively speaking, suppose you are looking for an optimal path to get from origin A to destination B. Having a map with no routes shown on it (the errors you obtain at the last layer of the neural network model) kind of tells you where to to go. But you may put yourself in a route which has a lot of obstacles, up hills and down hills. Then suppose someone tells you about a route a a direction he has gone through before (the pre-training) and hands you a new map (the pre=training phase's starting point).

This could be an intuitive reason on why starting with random weights and immediately start to optimize the model with backpropagation may not necessarily help you achieve the performance you obtain with a pre-trained model. However, note that many models achieving state-of-the-art results do not use pre-training necessarily and they may use the backpropagation in combination with other optimization methods (e.g. adagrad, RMSProp, Momentum and ...) to hopefully avoid getting stuck in a bad local minima.

enter image description here

Here's the source for the second image.

128

answered Jun 05 '23 08:06

Amir

Related questions
                            
                                10*10 fold cross validation in scikit-learn?
                            
                                Disease named entity recognition
                            
                                How to approach Machine Learning problems with dynamically sized input collection?
                            
                                bag of words - image classification
                            
                                facial expression classification in real time using SVM
                            
                                Why is scikit-learn's random forest using so much memory?
                            
                                Computing AUC and ROC curve from multi-class data in scikit-learn (sklearn)?
                            
                                Load Custom Dataset (which is like 20 news group set) in Scikit for Classification of text documents
                            
                                Does Caffe need data to be shuffled?
                            
                                How to classify continuous audio
                            
                                Where is the code for gradient descent?
                            
                                SciKit Learn SVR runs very long
                            
                                sklearn roc_auc_score with multi_class=="ovr" should have None average available
                            
                                How to check if a model is in train or eval mode in Pytorch?
                            
                                Random forest on a big dataset
                            
                                Processing large amount of data in Python
                            
                                Determining geo location by arbitrary body of text
                            
                                GridSearchCV no reporting on high verbosity
                            
                                Gradient boosting on Vowpal Wabbit
                            
                                Retrieve indices of NaN values in a pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does pre-training improve classification in neural networks?

Tags:

machine-learning

neural-network

deep-learning

conv-neural-network

autoencoder

VM_AI

People also ask

1 Answers

Amir

Recent Activity

Donate For Us