Keras Batchnormalization, differing results in trainin and evaluation on training dataset

Question

I'm am training a CNN, for the sake of debugging a my problem I am working on a small subset of the actual training data.

During training the loss and accuracy seem very reasonable and pretty good. (In the example I used the same small subset for validation, the problem shows here already)

Fit on x_train and validate on x_train, using batch_size=32

Epoch 10/10
1/10 [==>...........................] - ETA: 2s - loss: 0.5126 - acc: 0.7778
2/10 [=====>........................] - ETA: 1s - loss: 0.3873 - acc: 0.8576
3/10 [========>.....................] - ETA: 1s - loss: 0.3447 - acc: 0.8634
4/10 [===========>..................] - ETA: 1s - loss: 0.3320 - acc: 0.8741
5/10 [==============>...............] - ETA: 0s - loss: 0.3291 - acc: 0.8868
6/10 [=================>............] - ETA: 0s - loss: 0.3485 - acc: 0.8848
7/10 [====================>.........] - ETA: 0s - loss: 0.3358 - acc: 0.8879
8/10 [=======================>......] - ETA: 0s - loss: 0.3315 - acc: 0.8863
9/10 [==========================>...] - ETA: 0s - loss: 0.3215 - acc: 0.8885
10/10 [==============================] - 3s - loss: 0.3106 - acc: 0.8863 - val_loss: 1.5021 - val_acc: 0.2707

When I evaluate on the same training dataset however the accuracy is really off from what I saw during training ( I would expect it to be at least as good as during training on the same dataset).

When evaluating straight forward or using

K.set_learning_phase(0)

I get, similar to the validation (Evaluating on x_train using batch_size=32):

Evaluation Accuracy: 0.266318537392, Loss:  1.50756853772

If I set the backend to learning phase the results get pretty good again, so the per batch normalization seems to work well. I suspect that the cumulated mean and variance are not properly being used.

So after

K.set_learning_phase(1)

I get (Evaluating on x_train using batch_size=32):

Evaluation Accuracy: 0.887728457507, Loss:  0.335956037511

I added the the batchnormalization layer after the first convolutional layer like this:

model = models.Sequential()
model.add(Conv2D(80, first_conv_size, strides=2, activation="relu", input_shape=input_shape, padding=padding_name))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(first_max_pool_size, strides=4, padding=padding_name))
...

Further down the line I would also have some dropout layers, which I removed to investigate the Batchnormalization behavior. My intend would be to use the model in non-training phase for normal prediction.

Shouldn't it work like that, or am I missing some additional configuration?

Thanks!

I'm using keras 2.0.8 with tensorflow 1.1.0 (anaconda)

Marcin Możejko · Accepted Answer

This is really annoying. When you set the learning_phase to be True - a BatchNormalization layer is getting normalization statistics straight from data what might be a problem when you have a small batch_size. I came across similar issue some time ago - and here you have my solution:

When building a model - add an option if the model would predict in either learning or not-learning phase and in this used in learning phase use the following class instead of BatchNormalization:

class NonTrainableBatchNormalization(BatchNormalization):
    """
    This class makes possible to freeze batch normalization while Keras 
    is in training phase.
    """
    def call(self, inputs, training=None):
        return super(
            NonTrainableBatchNormalization, self).call(inputs, training=False)

Once you train your model - reset its weights to a NonTrainable copy:
```
learning_phase_model.set_weights(learned_model.get_weights())
```

Now you can fully enjoy using BatchNormalization in a learning_phase.

Keras Batchnormalization, differing results in trainin and evaluation on training dataset

Tags:

tensorflow

keras

batch-normalization

oole

1 Answers

Marcin Możejko

Recent Activity

Donate For Us

Keras Batchnormalization, differing results in trainin and evaluation on training dataset

Tags:

tensorflow

keras

batch-normalization

oole

1 Answers

Marcin Możejko

Related questions

Recent Activity

Donate For Us