I am using the official Batch Normalization (BN) function (tf.contrib.layers.batch_norm()) of Tensorflow on the MNIST data. I use the following code for adding BN:
local4_bn = tf.contrib.layers.batch_norm(local4, is_training=True)
During testing, I change "is_training=False" in the above line of code and observe only 20% accuracy. However, it gives ~99% accuracy if I use the above code also for testing (i.e., keeping is_training=True) with a batch size of 100 images. This observation indicates that the exponential moving average and variance computed by batch_norm() are probably incorrect or I am missing something in my code.
Can anyone please answer about the solution of the above problem.
You get ~99% accuracy when you test you model with is_training=True
only because of the batch size of 100.
If you change the batch size to 1 your accuracy will decrease.
This is due to the fact that you're computing the exponential moving average and variance for the input batch and than you're (batch-)normalizing the layers output using these values.
The batch_norm
function have the parameter variables_collections
that helps you to store the computed moving average and variance during the train phase and reuse them during the test phase.
If you define a collection for these variables, then the batch_norm
layer will use them during the testing phase, instead of calculating new values.
Therefore, if you change you batch normalization layer definition to
local4_bn = tf.contrib.layers.batch_norm(local4, is_training=True, variables_collections=["batch_norm_non_trainable_variables_collection"])
The layer will store the computed variables into the "batch_norm_non_trainable_variables_collection"
collection.
In the test phase, when you pass the is_training=False
parameters, the layer will re-use the computed value that it find in the collection.
Note that the moving average and the variance are not trainable parameters and therefore, if you save only your model trainable parameters in the checkpoint files, you have to manually add the non-trainable variables stored into the previously defined collection.
You can do it when you create the Saver
object:
saver = tf.train.Saver(tf.get_trainable_variables() + tf.get_collection_ref("batch_norm_non_trainable_variables_collection") + otherlistofvariables)
In addiction, since batch normalization can limit the expressive power of the layer which is applied to (because it restricts the range of the values), you should enable the network to learn the parameters gamma
and beta
(the affine transformation coefficients described in the paper) that allows the network to learn, thus, an affine transformation that increase the representation power of the layer.
You can enable the learning of these parameters setting to True
the parameter of the batch_norm
function, in this way:
local4_bn = tf.contrib.layers.batch_norm(
local4,
is_training=True,
center=True, # beta
scale=True, # gamma
variables_collections=["batch_norm_non_trainable_variables_collection"])
I have encountered the same question when processing MNIST. My train acc is normal, while test acc is very low at the begining, and then it grows gradually.
I changed default momentum=0.99 to momentum=0.9, then it works fine My source code is here:
mnist_bn_fixed.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With