I've been playing around with batch normalization in Keras. I was wondering if batch normalization also normalizes the inputs to the neural network. Does that mean I do not need to standardize my inputs to my network and rely on BN to do it?
Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks. Standardization assumes that your data has a Gaussian (bell curve) distribution.
Hence the distribution of the hidden activation will also change. This change in hidden activation is known as an internal covariate shift. However, according to a study by MIT researchers, the batch normalization does not solve the problem of internal covariate shift.
Disadvantages of Batch Normalization. • Difficult to estimate mean and standard deviation of input during. testing. • Cannot use batch size of 1 during training. • Computational overhead during training.
Batch Norm is just another network layer that gets inserted between a hidden layer and the next hidden layer. Its job is to take the outputs from the first hidden layer and normalize them before passing them on as the input of the next hidden layer.
While you can certainly use it for that, batch normalization is not designed to do that and you will most likely introduce sampling error in your normalization due to the limited sample size (sample size is your batch size).
Another factor for why I would not recommend using batch normalization for normalizing your inputs is that it introduces the correction terms gamma and beta (trained parameters) which will skew your training data if not disabled.
For normalization of your test data I would recommend using z-score normalization on the complete training set (e.g., via sklearn's StandardScaler) or some appropriate alternative, but not batch normalization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With