As known, main problem in DNN is long time of learning.
But there are some ways to accelerate learning:
=(x-AVG)/Variance
: https://arxiv.org/abs/1502.03167
Batch Normalization achieves the same accuracy with 14 times fewer training steps
=max(x, 0)
- rectified linear unit (ReLU,LReLU,PReLU,RReLU): https://arxiv.org/abs/1505.00853
The advantage of using non-saturated activation function lies in two aspects: The first is to solve the so called “exploding/vanishing gradient”. The second is to accelerate the convergence speed.
Or any one: (maxout, ReLU-family, tanh)
Our initialization matches the current state-of-the-art unsupervised or self-supervised pre-training methods on standard computer vision tasks, such as image classification and object detection, while being roughly three orders of magnitude faster.
Or LSUV-initialization (Layer-sequential unit-variance): https://arxiv.org/abs/1511.06422
But if we use all steps: (1) Batch Normalization, (2) ReLU, (3) Fast weight initialization or LSUV - then is there any sense to use autoencoder/autoassociator in any steps of training deep neural network?
Autoencoders can be seen as an alternative method to initialize the weights in a smart way. As such, you use autoencoders instead of the "fast" weight initialization algorithm you describe.
Autoencoders and RBMs are/were frequently used to pre-train a deep neural network. Early deep neural networks were almost impossible to train, due to the very high-dimensional parameter space. A simple stochastic gradient descent algorithm only converged very slowly and would usually get stuck in a bad local optimum. A solution to this problem was to use RBMs (G. Hinton et al.) or Autoencoders (Y. Bengio et al.) to pre-train the network in an unsupervised fashion.
This comes with two large advantages:
After training the RBM or autoencoder, you would place an output layer on top of the pre-trained network, and train the whole network in a supervised fashion with backpropagation. This step is also called fine-tuning. As all layers except the output layer are already pre-trained, the weights don't have to be changed much, and you will find a solution very quickly.
Does it make sense to use autoencoders? If you have lots and lots of labeled training data, why even bother? Just initialize the weights as smart as you can, and let the GPUs roar for a couple of weeks.
However, if labeled training data is scarce, gather lots of unlabeled data and train autoencoders. With that, you make sure that you achieve a fast convergence and a good solution with the few labeled examples you have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With