Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow and Batch Normalization with Batch Size==1 => Outputs all zeros

I have a question about the understanding of the BatchNorm (BN later on).

I have a convnet working nicely, I was writing tests to check for shape and outputs range. And I noticed that when I set the batch_size = 1, my model outputs zeros (logits and activations).

I prototyped the simplest convnet with BN:

Input => Conv + ReLU => BN => Conv + ReLU => BN => Conv Layer + Tanh

The model is initialized with xavier initialization. I guess that BN during training do some calculations that require Batch_size > 1.

I have found an issue in PyTorch that seems to talk about this: https://github.com/pytorch/pytorch/issues/1381

Could anyone explain this ? It's still a little blurry for me.


Example Run:

Important: Tensorlayer Library is required for this script to run: pip install tensorlayer

import tensorflow as tf
import tensorlayer as tl

import numpy as np

def conv_net(inputs, is_training):

    xavier_initilizer = tf.contrib.layers.xavier_initializer(uniform=True)
    normal_initializer = tf.random_normal_initializer(mean=1., stddev=0.02)

    # Input Layers
    network = tl.layers.InputLayer(inputs, name='input')

    fx = [64, 128, 256, 256, 256]

    for i, n_out_channel in enumerate(fx):

        with tf.variable_scope('h' + str(i + 1)):

            network = tl.layers.Conv2d(
                network,
                n_filter    = n_out_channel,
                filter_size = (5, 5),
                strides     = (2, 2),
                padding     = 'VALID',
                act         = tf.identity,
                W_init      = xavier_initilizer,
                name        = 'conv2d'
            )

            network = tl.layers.BatchNormLayer(
                network,
                act        = tf.identity,
                is_train   = is_training,
                gamma_init = normal_initializer,
                name       = 'batch_norm'
            )

            network = tl.layers.PReluLayer(
                layer  = network,
                a_init = tf.constant_initializer(0.2),
                name   ='activation'
            )

    ############# OUTPUT LAYER ###############

    with tf.variable_scope('h' + str(len(fx) + 1)):
        '''

        network = tl.layers.FlattenLayer(network, name='flatten')

        network = tl.layers.DenseLayer(
            network,
            n_units = 100,
            act     = tf.identity,
            W_init  = xavier_initilizer,
            name    = 'dense'
        )

        '''

        output_filter_size = tuple([int(i) for i in network.outputs.get_shape()[1:3]])

        network = tl.layers.Conv2d(
            network,
            n_filter    = 100,
            filter_size = output_filter_size,
            strides     = (1, 1),
            padding     = 'VALID',
            act         = tf.identity,
            W_init      = xavier_initilizer,

            name        = 'conv2d'
        )

        network = tl.layers.BatchNormLayer(
            network,
            act        = tf.identity,
            is_train   = is_training,
            gamma_init = normal_initializer,
            name       = 'batch_norm'
        )

        net_logits = network.outputs

        network.outputs = tf.nn.tanh(
            x        = network.outputs,
            name     = 'activation'
        )

        net_output = network.outputs

    return network, net_output, net_logits


if __name__ == '__main__':

    tf.logging.set_verbosity(tf.logging.DEBUG)

    #################################################
    #                MODEL DEFINITION               #
    #################################################

    PLH_SHAPE = [None, 256, 256, 3]

    input_plh = tf.placeholder(tf.float32, PLH_SHAPE, name='input_placeholder')

    convnet, net_out, net_logits = conv_net(input_plh, is_training=True)


    with tf.Session() as sess:
        tl.layers.initialize_global_variables(sess)

        convnet.print_params(details=True)

        #################################################
        #                  LAUNCH A RUN                 #
        #################################################

        for BATCH_SIZE in [1, 2]:

            INPUT_SHAPE = [BATCH_SIZE, 256, 256, 3]

            batch_data = np.random.random(size=INPUT_SHAPE)

            output, logits = sess.run(
                [net_out, net_logits],
                feed_dict={
                    input_plh: batch_data
                }
            )

            if tf.logging.get_verbosity() == tf.logging.DEBUG:
                print("\n\n###########################")

                print("\nBATCH SIZE = %d\n" % BATCH_SIZE)

            tf.logging.debug("output => Shape: %s - Mean: %e - Std: %f - Min: %f - Max: %f" % (
                output.shape,
                output.mean(),
                output.std(),
                output.min(),
                output.max()
            ))

            tf.logging.debug("logits => Shape: %s - Mean: %e - Std: %f - Min: %f - Max: %f" % (
                logits.shape,
                logits.mean(),
                logits.std(),
                logits.min(),
                logits.max()
            ))

            if tf.logging.get_verbosity() == tf.logging.DEBUG:
                print("###########################")

Gives the following output:

###########################

BATCH SIZE = 1

DEBUG:tensorflow:output => Shape: (1, 1, 1, 100) - Mean: 0.000000e+00 - Std: 0.000000 - Min: 0.000000 - Max: 0.000000
DEBUG:tensorflow:logits => Shape: (1, 1, 1, 100) - Mean: 0.000000e+00 - Std: 0.000000 - Min: 0.000000 - Max: 0.000000
###########################


###########################

BATCH SIZE = 2

DEBUG:tensorflow:output => Shape: (2, 1, 1, 100) - Mean: -1.430511e-08 - Std: 0.760749 - Min: -0.779634 - Max: 0.779634
DEBUG:tensorflow:logits => Shape: (2, 1, 1, 100) - Mean: -4.768372e-08 - Std: 0.998715 - Min: -1.044437 - Max: 1.044437
###########################
like image 447
Jonathan DEKHTIAR Avatar asked Jan 18 '18 12:01

Jonathan DEKHTIAR


People also ask

Does batch normalization work for batch size of 1?

No, it won't; BatchNormalization computes statistics only with respect to a single axis (usually the channels axis, =-1 (last) by default); every other axis is collapsed, i.e. summed over for averaging; details below.

What is the output of batch normalization?

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference.

Does batch normalization depend on batch size?

@hhoomn The batch size does play a role in accuracy when using batch normalization, meaning your concern for normalizing on small batch sizes I understand. This is the case with almost all ML problems involving batch size though, as a higher batch size results in a more complete representation of your data.


2 Answers

You should probably read an explanation about Batch Normalization, such as this one. You can also take a look at tensorflow's related doc.

Basically, there are 2 ways you can do batch_norm, and both have problems dealing with batch size of 1:

  • using a moving mean and variance pixel per pixel, so they are tensors of the same shape as each sample in your batch. This is the one used in @layog's answer, and (I think) in the original paper, and the most used.

  • Using a moving mean and variance over the entire image / feature space, so they are just vectors (rank 1) of shape (n_channels,).

In both cases, you'll have:

output = gamma * (input - mean) / sigma + beta

Beta is often set to 0 and gamma to 1, since you have linear functions right after BN.

During training, mean and variance are computed accross the current batch, which causes problem when it is of size 1:

  • in the 1st case, you'll get mean=input, so output=0
  • in the 2nd case, mean will be the average value over all pixels, so it's better; but if your width and height are also 1, then you get mean=input again, so you get output=0.

I think most people (and the original method) use the 1st way, which is why you'll get 0 (although TF doc seems to suggest that the 2nd method is usual too). The argument in the link you're providing seems to be considering the 2nd method.

In any case (whichever you're using), with BN you'll only get good results if you use a bigger batch size (say, at least 10).

like image 178
gdelab Avatar answered Oct 25 '22 09:10

gdelab


Batch Normalization normalizes each output over a complete batch using the following (from original paper).

BatchNorm Formula

So take for example, that you have the following outputs (size 3) for batch size of 2

[2, 4, 6]
[4, 6, 8]

Now mean for each of the output over the batch will be

[3, 5, 7]

Now, look at the numerator in the above formula. It is subtracting mean from each element of the output. But, if the batch size is 1, then mean will exactly be the same as the output, so it will evaluate to 0.

As a side note, even the denominator will also be evaluated to 0 but it seems that tensorflow outputs 0 in a 0/0 situation.

like image 31
layog Avatar answered Oct 25 '22 07:10

layog