Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can not use both bias and batch normalization in convolution layers

Tags:

I use slim framework for tensorflow, because of its simplicity. But I want to have convolutional layer with both biases and batch normalization. In vanilla tensorflow, I have:

def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, name="conv2d"):     with tf.variable_scope(name):         w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],      initializer=tf.contrib.layers.xavier_initializer(uniform=False))     conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')      biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))     conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())      tf.summary.histogram("weights", w)     tf.summary.histogram("biases", biases)      return conv  d_bn1 = BatchNorm(name='d_bn1') h1 = lrelu(d_bn1(conv2d(h0, df_dim + y_dim, name='d_h1_conv'))) 

and I rewrote it to slim by this:

h1 = slim.conv2d(h0,                  num_outputs=self.df_dim + self.y_dim,                  scope='d_h1_conv',                  kernel_size=[5, 5],                  stride=[2, 2],                  activation_fn=lrelu,                  normalizer_fn=layers.batch_norm,                  normalizer_params=batch_norm_params,                                             weights_initializer=layers.xavier_initializer(uniform=False),                  biases_initializer=tf.constant_initializer(0.0)                  ) 

But this code does not add bias to conv layer. That is because of https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L1025 where is

    layer = layer_class(filters=num_outputs,                     kernel_size=kernel_size,                     strides=stride,                     padding=padding,                     data_format=df,                     dilation_rate=rate,                     activation=None,                     use_bias=not normalizer_fn and biases_initializer,                     kernel_initializer=weights_initializer,                     bias_initializer=biases_initializer,                     kernel_regularizer=weights_regularizer,                     bias_regularizer=biases_regularizer,                     activity_regularizer=None,                     trainable=trainable,                     name=sc.name,                     dtype=inputs.dtype.base_dtype,                     _scope=sc,                     _reuse=reuse)     outputs = layer.apply(inputs) 

in the construction of layer, which results in not having bias when using batch normalization. Does that mean that I can not have both biases and batch normalization using slim and layers library? Or is there another way to achieve having both bias and batch normalization in layer when using slim?

like image 797
Matěj Račinský Avatar asked Sep 16 '17 17:09

Matěj Račinský


People also ask

Can we use dropout and batch normalization together?

It seems to suggest not to use them together at all("explains the disharmony between Dropout and Batch Norm(BN)"). This is the answer for the question. Dropout changes the "standard deviation" of the distribution during training, but doesn't change the distribution during validation.

When should I use batch normalization in CNN?

It can be used at several points in between the layers of the model. It is often placed just after defining the sequential model and after the convolution and pooling layers. The below code shows how to define the BatchNormalization layer for the classification of handwritten digits.

Why is batch normalization used in CNN architecture?

Batch Norm is a normalization technique done between the layers of a Neural Network instead of in the raw data. It is done along mini-batches instead of the full data set. It serves to speed up training and use higher learning rates, making learning easier. the standard deviation of the neurons' output.

Why batch normalization is not used in RNN?

No, you cannot use Batch Normalization on a recurrent neural network, as the statistics are computed per batch, this does not consider the recurrent part of the network. Weights are shared in an RNN, and the activation response for each "recurrent loop" might have completely different statistical properties.


1 Answers

Batchnormalization already includes the addition of the bias term. Recap that BatchNorm is already:

gamma * normalized(x) + bias 

So there is no need (and it makes no sense) to add another bias term in the convolution layer. Simply speaking BatchNorm shifts the activation by their mean values. Hence, any constant will be canceled out.

If you still want to do this, you need to remove the normalizer_fn argument and add BatchNorm as a single layer. Like I said, this makes no sense.

But the solution would be something like

net = slim.conv2d(net, normalizer_fn=None, ...) net = tf.nn.batch_normalization(net) 

Note, the BatchNorm relies on non-gradient updates. So you either need to use an optimizer which is compatible with the UPDATE_OPS collection. Or you need to manually add tf.control_dependencies.

Long story short: Even if you implement the ConvWithBias+BatchNorm, it will behave like ConvWithoutBias+BatchNorm. It is the same as multiple fully-connected layers without activation function will behave like a single one.

like image 104
Patwie Avatar answered Sep 20 '22 14:09

Patwie