What is "batch normalizaiton"? why using it? how does it affect prediction?

Tags:

Recently, many deep architectures use "batch normalization" for training.

What is "batch normalization"? What does it do mathematically? In what way does it help the training process?

How is batch normalization used during training? is it a special layer inserted into the model? Do I need to normalize before each layer, or only once?

Suppose I used batched normalization for training. Does this affect my test-time model? Should I replace the batch normalization with some other/equivalent layer/operation in my "deploy" network?

This question about batch normalization only covers part of this question, I was aiming and hoping for a more detailed answer. More specifically, I would like to know how training with batch normalization affect test time prediction, i.e., the "deploy" network and the TEST phase of the net.

897

asked Dec 21 '16 18:12

Shai

2 Answers

The batch normalization is for layers that can suffer from deleterious drift. The math is simple: find the mean and variance of each component, then apply the standard transformation to convert all values to the corresponding Z-scores: subtract the mean and divide by the standard deviation. This ensures that the component ranges are very similar, so that they'll each have a chance to affect the training deltas (in back-prop).

If you're using the network for pure testing (no further training), then simply delete these layers; they've done their job. If you're training while testing / predicting / classifying, then leave them in place; the operations won't harm your results at all, and barely slow down the forward computations.

As for Caffe specifics, there's really nothing particular to Caffe. The computation is a basic stats process, and is the same algebra for any framework. Granted, there will be some optimizations for hardware that supports vector and matrix math, but those consist of simply taking advantage of the chip's built-in operations.

RESPONSE TO COMMENT

If you can afford a little extra training time, yes, you'd want to normalize at every layer. In practice, inserting them less frequently -- say, every 1-3 inceptions -- will work just fine.

You can ignore these in deployment because they've already done their job: when there's no back-propagation, there's no drift of weights. Also, when the model handles only one instance in each batch, the Z-score is always 0: every input is exactly the mean of the batch (being the entire batch).

answered Nov 27 '22 18:11

Prune

As a complement to Prune's answer, during testing, batch normalization layer will use the average mean/variance/scale/shift values from different training iterations to normalize its input(subtract mean and divide by the standard deviation).

And the original google's batch normalization paper only said that it should be a moving average method and no more thorough explanation was provided though. Both caffe and tensorflow use an exponential moving average method.

In my experience, a simple moving average method usually better than an exponential moving average method, as far as to the validation accuracy(Maybe it need more experiments). For a compare, you can refer to here(I tried the two moving average methods implementations in channel_wise_bn_layer, compared with the batch norm layer in BVLC/caffe).

answered Nov 27 '22 19:11

Dale

Related questions
                            
                                Differences between sklearn's SimpleImputer and Imputer
                            
                                What if the sample size is not divisible by batch_size in Keras model
                            
                                What is the difference between return state and return sequence in a keras GRU layer?
                            
                                Any examples for Numpy asanyarray vs asarray?
                            
                                PyTorch torch.no_grad() versus requires_grad=False
                            
                                Datasets to test Nonlinear SVM
                            
                                How to use custom classifiers in ensemble classifiers in sklearn?
                            
                                General synonym and part of speech processing using nltk
                            
                                Calculating entropy in decision tree (Machine learning)
                            
                                What are some relevant A.I. techniques for programming a flock of entities?
                            
                                Two algorithms to find nearest neighbor with Locality-sensitive hashing, which one?
                            
                                Scaling issues with scipy.sparse matrix while using scikit
                            
                                Phrase extraction algorithm for statistical machine translation
                            
                                Turning a bunch of numeric attributes into a single score
                            
                                how to use GridSearchCV with custom estimator in sklearn?
                            
                                How to tell scikit-learn for which label the F-1/precision/recall score is given (in binary classification)?
                            
                                Neural network backpropagation algorithm not working in Python
                            
                                Sklearn AffinityPropagation MemoryError
                            
                                Is adaptive parsing possible in Prolog?
                            
                                Keras: reshape to connect lstm and conv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is "batch normalizaiton"? why using it? how does it affect prediction?

Tags:

machine-learning

neural-network

deep-learning

normalization

caffe

Shai

People also ask

2 Answers

Prune

Dale

Recent Activity

Donate For Us