Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any sense to use autoencoder for network with batch normalization?

As known, main problem in DNN is long time of learning.

But there are some ways to accelerate learning:

  1. Batch Normalization =(x-AVG)/Variance: https://arxiv.org/abs/1502.03167

Batch Normalization achieves the same accuracy with 14 times fewer training steps

  1. ReLU =max(x, 0) - rectified linear unit (ReLU,LReLU,PReLU,RReLU): https://arxiv.org/abs/1505.00853

The advantage of using non-saturated activation function lies in two aspects: The first is to solve the so called “exploding/vanishing gradient”. The second is to accelerate the convergence speed.

Or any one: (maxout, ReLU-family, tanh)

  1. Fast weight initialization (with avoiding vanishing or exploding gradients): https://arxiv.org/abs/1511.06856

Our initialization matches the current state-of-the-art unsupervised or self-supervised pre-training methods on standard computer vision tasks, such as image classification and object detection, while being roughly three orders of magnitude faster.

Or LSUV-initialization (Layer-sequential unit-variance): https://arxiv.org/abs/1511.06422

But if we use all steps: (1) Batch Normalization, (2) ReLU, (3) Fast weight initialization or LSUV - then is there any sense to use autoencoder/autoassociator in any steps of training deep neural network?

like image 671
Alex Avatar asked Jan 05 '23 23:01

Alex


1 Answers

tl;dr

Autoencoders can be seen as an alternative method to initialize the weights in a smart way. As such, you use autoencoders instead of the "fast" weight initialization algorithm you describe.

More detailed explanation

Autoencoders and RBMs are/were frequently used to pre-train a deep neural network. Early deep neural networks were almost impossible to train, due to the very high-dimensional parameter space. A simple stochastic gradient descent algorithm only converged very slowly and would usually get stuck in a bad local optimum. A solution to this problem was to use RBMs (G. Hinton et al.) or Autoencoders (Y. Bengio et al.) to pre-train the network in an unsupervised fashion.

This comes with two large advantages:

  1. You do not need lots of labeled training data. Often, there is a lot of unlabeled data available (think: images on the internet), but labeling them is a very expensive task.
  2. You can greedily train them layer-by-layer. That means, you train a first (1-layer) autoencoder. Once you achieve a good reconstruction, you stack another autoencoder on top of that. You train the second autoencoder without touching the first autoencoder. This helps to keep the number of parameters low, and thus makes training simpler and faster.

After training the RBM or autoencoder, you would place an output layer on top of the pre-trained network, and train the whole network in a supervised fashion with backpropagation. This step is also called fine-tuning. As all layers except the output layer are already pre-trained, the weights don't have to be changed much, and you will find a solution very quickly.

Finally, the answer to your question

Does it make sense to use autoencoders? If you have lots and lots of labeled training data, why even bother? Just initialize the weights as smart as you can, and let the GPUs roar for a couple of weeks.

However, if labeled training data is scarce, gather lots of unlabeled data and train autoencoders. With that, you make sure that you achieve a fast convergence and a good solution with the few labeled examples you have.

like image 91
hbaderts Avatar answered Jan 13 '23 17:01

hbaderts