Batch normalization uses a mini-batch mean and variance to normalize layer output. If I train a network with batch size, say 100, but then want to use the trained network on single-shot predictions (batch size 1), should I expect to run into problems? Should I penalize the batch norm layer to converge towards the identity transform during learning to avoid this?
No, there are no problems when doing that, at test time the batch normalization layer just scales and shifts the inputs, with factors learned at training time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With