For batch normalization during testing, how does one calculate the mean and variance of each activation input (in each layer and input dimension)? Does one record the means and variances from training, calculate the means and variances of the entire training set, or calculate the means and variances of the entire test set?
Many people say you have to precalculate the means and variances, but if you use the method of calculating the means and variances of the entire test set, wouldn't you need to calculate the means and variances of the entire test set while performing forward propagation (not "pre")?
Thank you so much for all your help!
When you are predicting on test, you always use train's statistics - be it simple transformation or batch normalization.
I'd recommend trying cs231n course to know more about this. Here is how I coded batch normalization while doing this code: github link.
If test statistics significantly differ from train, this means that test is different in general and the model won't work well. In this case you'll need to find different training data anyway. But to be more precise - when you train model on data, processed in a certain way, it won't work well on data, which is processed in a different way.
Let's imagine that there is only 1 test sample - i. e. you want to make a prediction for one client or whatever. You simply can't calculate test statistics in this case. Secondly, let's take batch normalization. Data is normalized and values now show by how many standard deviations original data differes from a certain average. So the model will use this information for training). If you normalize test data using test statistics, then values will show deviation from a different average.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With