Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it normal to use batch normalization in RNN & LSTM? [closed]

Tags:

I know in regular neural nets people use batch norm before activation and it will reduce the reliance on good weight initialization. I wonder if it would do the same to RNN/lstm RNN when i use it. Does anyone have any experience with it?

like image 915
Peter Deng Avatar asked Aug 03 '17 19:08

Peter Deng


People also ask

Can we use batch normalization with RNN?

Batch normalization can be applied in between stacks of RNN, where normalization is applied “vertically” i.e. the output of each RNN. But it cannot be applied “horizontally” i.e. between timesteps, as it hurts training because of exploding gradients due to repeated rescaling.

Should you always use batch normalization?

Batch normalization is commonly used in almost all recent deep learning architectures to improve convergence speed and improve performance. But not many works are actually concerned about BN's drawbacks, but consider them as some magic beneficial to the model.

When should batch normalization be used?

Batch normalization is also used to maintain the distribution of the data. Batch normalization is one of the important features we add to our model helps as a Regularizer, normalizing the inputs, in the backpropagation process, and can be adapted to most of the models to converge better.

Does Lstm need batch normalization?

The LSTM architecture allows for models with fewer vanishing gradients and there- fore more memory than a vanilla recurrent network. Batch normalization lends a higher training speed to the model.


Video Answer


1 Answers

No, you cannot use Batch Normalization on a recurrent neural network, as the statistics are computed per batch, this does not consider the recurrent part of the network. Weights are shared in an RNN, and the activation response for each "recurrent loop" might have completely different statistical properties.

Other techniques similar to Batch Normalization that take these limitations into account have been developed, for example Layer Normalization. There are also reparametrizations of the LSTM layer that allow Batch Normalization to be used, for example as described in Recurrent Batch Normalization by Coijmaans et al. 2016.

like image 192
Dr. Snoopy Avatar answered Sep 23 '22 17:09

Dr. Snoopy