Can I use batch normalization layer right after input layer and not normalize my data? May I expect to get similar effect/performance? In keras functional it would be something like this: <pre class="prettyprint"><code>x = Input (...) x = Batchnorm(...)(x) ... </code></pre>

You can do it. But the nice thing about batchnorm, in addition to activation distribution stabilization, is that the mean and std deviation are likely migrate as the network learns. Effectively, setting the batchnorm right after the input layer is a fancy data pre-processing step. It helps, sometimes a lot (e.g. in linear regression). But it's easier and more efficient to compute the mean and variance of the whole training sample once, than learn it per-batch. Note that batchnorm isn't free in terms of performance and you shouldn't abuse it. <hr>

Batch normalization instead of input normalization

Tags:

artificial-intelligence

machine-learning

neural-network

keras

batch-normalization

Can I use batch normalization layer right after input layer and not normalize my data? May I expect to get similar effect/performance?

In keras functional it would be something like this:

x = Input (...)
x = Batchnorm(...)(x)
...

477

asked Oct 16 '17 13:10

user2146414

1 Answers

You can do it. But the nice thing about batchnorm, in addition to activation distribution stabilization, is that the mean and std deviation are likely migrate as the network learns.

Effectively, setting the batchnorm right after the input layer is a fancy data pre-processing step. It helps, sometimes a lot (e.g. in linear regression). But it's easier and more efficient to compute the mean and variance of the whole training sample once, than learn it per-batch. Note that batchnorm isn't free in terms of performance and you shouldn't abuse it.

169

answered Oct 24 '22 01:10

Maxim

Related questions
                            
                                Training on imbalanced data using TensorFlow
                            
                                Hyperparameter optimization for Deep Learning Structures using Bayesian Optimization
                            
                                Building a mutlivariate, multi-task LSTM with Keras
                            
                                What is a bad, decent, good, and excellent F1-measure range?
                            
                                What is a threshold in a Precision-Recall curve?
                            
                                Information Gain calculation with Scikit-learn
                            
                                Precision/recall for multiclass-multilabel classification
                            
                                How To Determine the 'filter' Parameter in the Keras Conv2D Function
                            
                                Predicting how long an scikit-learn classification will take to run
                            
                                Are GAN's unsupervised or supervised?
                            
                                Keras error : Expected to see 1 array
                            
                                Why does sklearn Imputer need to fit?
                            
                                Tensor is not an element of this graph
                            
                                What's the difference between LSTM() and LSTMCell()?
                            
                                Is there a better way to guess possible unknown variables without brute force than I am doing? Machine learning? [duplicate]
                            
                                What is the meaning of the nu parameter in Scikit-Learn's SVM class?
                            
                                keras BatchNormalization axis clarification
                            
                                How to disable dropout while prediction in keras?
                            
                                ValueError: Variable rnn/basic_rnn_cell/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
                            
                                Clustering Algorithm for Mapping Application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With