batch normalization in neural network

Tags:

I'm still fairly new with ANN and I was just reading the Batch Normalization paper (http://arxiv.org/pdf/1502.03167.pdf), but I'm not sure I'm getting what they are doing (and more importantly, why it works)

So let's say I have two layers L1 and L2, where L1 produces outputs and sends them to the neurons in L2. Batch normalization just takes all the outputs from L1 (i.e. every single output from every single neuron, getting an overall vector of |L1| X |L2| numbers for a fully connected network), normalizes them to have a mean of 0 and SD of 1, and then feeds them to their respective neurons in L2 (plus applying the linear transformation of gamma and beta they were discussing in the paper)?

If this is indeed the case, how is this helping the NN? what's so special about a constant distribution?

637

asked Apr 30 '15 23:04

WhiteTiger

1 Answers

During standard SGD training of a network, the distribution of inputs to a hidden layer will change because the hidden layer before it is constantly changing as well. This is known as covariate shift and can be a problem; see, for instance, here.

It is known that neural networks converge faster if the training data is "whitened", that is, transformed in such a way that each component has a Gaussian distribution and is independent of the other components. See the papers (LeCun et al., 1998b) and (Wiesler & Ney, 2011) cited in the paper.

The idea of the authors is now to apply this whitening not only to the input layer, but to the input of every intermediate layer as well. It would be too expensive to do this over the entire input dataset, so instead they do it batch-wise. They claim that this can vastly speed up the training process and also acts as a sort of regularization.

150

answered Sep 23 '22 08:09

cfh

Related questions
                            
                                How to convert 2d numpy array into binary indicator matrix for max value
                            
                                What is the difference between Deep Learning and traditional Artificial Neural Network machine learning? [closed]
                            
                                Neural Network: Solving XOR
                            
                                Using Keras, how can I input an X_train of images (more than a thousand images)?
                            
                                Keras Image data generator throwing no files found error?
                            
                                python scikit learn, get documents per topic in LDA
                            
                                How does Keras calculate the accuracy?
                            
                                Number of feature maps produced after each convolution layer in CNN's
                            
                                Can flow_from_directory get train and validation data from the same directory in Keras?
                            
                                How to plot confusion matrix for prefetched dataset in Tensorflow
                            
                                Matlab:K-means clustering
                            
                                How to forecast in python using machine learning , from a given set of geographical data?
                            
                                How to preprocess data for machine learning? [closed]
                            
                                What is a Recurrent Neural Network, what is a Long Short Term Memory (LSTM) network, and is it always better? [closed]
                            
                                How can I calculate the point between two overlapping linear datasets?
                            
                                Element-wise constraints in scipy.optimize.minimize
                            
                                NLTK. Detecting whether a sentence is Interogative or Not?
                            
                                Understanding a multilayer perceptron network
                            
                                Writing array to csv python (one column)
                            
                                How can I know training data is enough for machine learning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

batch normalization in neural network

Tags:

machine-learning

neural-network

normalization

WhiteTiger

People also ask

1 Answers

cfh

Recent Activity

Donate For Us