Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting low test accuracy using Tensorflow batch_norm function

I am using the official Batch Normalization (BN) function (tf.contrib.layers.batch_norm()) of Tensorflow on the MNIST data. I use the following code for adding BN:

local4_bn = tf.contrib.layers.batch_norm(local4, is_training=True)

During testing, I change "is_training=False" in the above line of code and observe only 20% accuracy. However, it gives ~99% accuracy if I use the above code also for testing (i.e., keeping is_training=True) with a batch size of 100 images. This observation indicates that the exponential moving average and variance computed by batch_norm() are probably incorrect or I am missing something in my code.

Can anyone please answer about the solution of the above problem.

like image 359
Hasnat Avatar asked Oct 17 '16 08:10

Hasnat


2 Answers

You get ~99% accuracy when you test you model with is_training=True only because of the batch size of 100. If you change the batch size to 1 your accuracy will decrease.

This is due to the fact that you're computing the exponential moving average and variance for the input batch and than you're (batch-)normalizing the layers output using these values.

The batch_norm function have the parameter variables_collections that helps you to store the computed moving average and variance during the train phase and reuse them during the test phase.

If you define a collection for these variables, then the batch_norm layer will use them during the testing phase, instead of calculating new values.

Therefore, if you change you batch normalization layer definition to

local4_bn = tf.contrib.layers.batch_norm(local4, is_training=True, variables_collections=["batch_norm_non_trainable_variables_collection"])

The layer will store the computed variables into the "batch_norm_non_trainable_variables_collection" collection.

In the test phase, when you pass the is_training=False parameters, the layer will re-use the computed value that it find in the collection.

Note that the moving average and the variance are not trainable parameters and therefore, if you save only your model trainable parameters in the checkpoint files, you have to manually add the non-trainable variables stored into the previously defined collection.

You can do it when you create the Saver object:

saver = tf.train.Saver(tf.get_trainable_variables() + tf.get_collection_ref("batch_norm_non_trainable_variables_co‌​llection") + otherlistofvariables)

In addiction, since batch normalization can limit the expressive power of the layer which is applied to (because it restricts the range of the values), you should enable the network to learn the parameters gamma and beta (the affine transformation coefficients described in the paper) that allows the network to learn, thus, an affine transformation that increase the representation power of the layer.

You can enable the learning of these parameters setting to True the parameter of the batch_norm function, in this way:

local4_bn = tf.contrib.layers.batch_norm(
    local4,
    is_training=True,
    center=True, # beta
    scale=True, # gamma
    variables_collections=["batch_norm_non_trainable_variables_collection"])
like image 195
nessuno Avatar answered Nov 02 '22 16:11

nessuno


I have encountered the same question when processing MNIST. My train acc is normal, while test acc is very low at the begining, and then it grows gradually.

enter image description here

I changed default momentum=0.99 to momentum=0.9, then it works fine My source code is here:

mnist_bn_fixed.py

like image 1
huosan0123 Avatar answered Nov 02 '22 16:11

huosan0123