Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use fixed padding when building resnet model in tensorflow

Tensorflow has an official realization of resnet in github. And it uses fixed padding instead of normal tf.layers.conv2d.

Something like this:

def conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format):
  """Strided 2-D convolution with explicit padding."""
  # The padding is consistent and is based only on `kernel_size`, not on the
  # dimensions of `inputs` (as opposed to using `tf.layers.conv2d` alone).
  if strides > 1:
    inputs = fixed_padding(inputs, kernel_size, data_format)

  return tf.layers.conv2d(
      inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides,
      padding=('SAME' if strides == 1 else 'VALID'), use_bias=False,
      kernel_initializer=tf.variance_scaling_initializer(),
      data_format=data_format)

What's the purpose of doing this? We can get a 16x16 feature map if we input a image of size 32x32 and use tf.layer.conv2d setting padding method to SAME, stride 2. But in the code above, it will pad zero in both side of image and then use padding method VALID.

like image 586
Keshawn Hsieh Avatar asked Dec 11 '17 01:12

Keshawn Hsieh


2 Answers

Let's assume we have stride of 2 and kernel size of 3.

Using tf.layers.conv2d with padding SAME:

Case 1:

                   pad|              |pad
       inputs:      0 |1  2  3  4  5 |0 
                   |_______|
                         |_______|
                               |_______|

Case 2:

                                     |pad
       inputs:      1  2  3  4  5  6 |0 
                   |_______|
                         |_______|
                               |_______|

You can see the padding will depend on the input size. The padding with same is determined such that the output size is Math.ceil(input_size / stride). You can read more about that here.

Using the fixed padding implementation of resnet:

Case 1:

                   pad|              |pad
       inputs:      0 |1  2  3  4  5 |0 
                   |_______|
                         |_______|
                               |_______|

Case 2:

                   pad|                 |pad
       inputs:      0 |1  2  3  4  5  6 |0 
                   |_______|
                         |_______|
                               |_______|

Padding is uniquely defined by the kernel size and stays independent of the input size.

like image 94
Martin Mihaylov Avatar answered Oct 28 '22 23:10

Martin Mihaylov


As you know RNN has these skip connection, where network looks like following: enter image description here

and the equation becomes following:

F(x) + x   // Here 'x' is not input but the the kernel/filter. 

So with this addition we assume that the dimension of F(x) and x will be same. But if they are not so we must pad them for convolution to happen.

This is the reason you will see padding="SAME" padding for all the convolutions in ResNet TF model

like image 33
Milind Deore Avatar answered Oct 28 '22 23:10

Milind Deore