Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

understanding output shape of keras Conv2DTranspose

I am having a hard time understanding the output shape of keras.layers.Conv2DTranspose

Here is the prototype:

keras.layers.Conv2DTranspose(
    filters,
    kernel_size,
    strides=(1, 1),
    padding='valid',
    output_padding=None,
    data_format=None,
    dilation_rate=(1, 1),
    activation=None,
    use_bias=True,
    kernel_initializer='glorot_uniform',
    bias_initializer='zeros',
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None
)

In the documentation (https://keras.io/layers/convolutional/), I read:

If output_padding is set to None (default), the output shape is inferred.

In the code (https://github.com/keras-team/keras/blob/master/keras/layers/convolutional.py), I read:

out_height = conv_utils.deconv_length(height,
                                      stride_h, kernel_h,
                                      self.padding,
                                      out_pad_h,
                                      self.dilation_rate[0])
out_width = conv_utils.deconv_length(width,
                                     stride_w, kernel_w,
                                     self.padding,
                                     out_pad_w,
                                     self.dilation_rate[1])
if self.data_format == 'channels_first':
    output_shape = (batch_size, self.filters, out_height, out_width)
else:
    output_shape = (batch_size, out_height, out_width, self.filters)

and (https://github.com/keras-team/keras/blob/master/keras/utils/conv_utils.py):

def deconv_length(dim_size, stride_size, kernel_size, padding, output_padding, dilation=1):

    """Determines output length of a transposed convolution given input length.
    # Arguments
        dim_size: Integer, the input length.
        stride_size: Integer, the stride along the dimension of `dim_size`.
        kernel_size: Integer, the kernel size along the dimension of `dim_size`.
        padding: One of `"same"`, `"valid"`, `"full"`.
        output_padding: Integer, amount of padding along the output dimension, can be set to `None` in which case the output length is inferred.
        dilation: dilation rate, integer.
    # Returns
        The output length (integer).
    """

    assert padding in {'same', 'valid', 'full'}
    if dim_size is None:
        return None

    # Get the dilated kernel size
    kernel_size = kernel_size + (kernel_size - 1) * (dilation - 1)

    # Infer length if output padding is None, else compute the exact length
    if output_padding is None:
        if padding == 'valid':
            dim_size = dim_size * stride_size + max(kernel_size - stride_size, 0)
        elif padding == 'full':
            dim_size = dim_size * stride_size - (stride_size + kernel_size - 2)
        elif padding == 'same':
            dim_size = dim_size * stride_size
    else:
        if padding == 'same':
            pad = kernel_size // 2
        elif padding == 'valid':
            pad = 0
        elif padding == 'full':
            pad = kernel_size - 1

        dim_size = ((dim_size - 1) * stride_size + kernel_size - 2 * pad + output_padding)

    return dim_size

I understand that Conv2DTranspose is kind of a Conv2D, but reversed.

Since applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 200x200 image will output a 20x20 image, I assume that applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 20x20 image will output a 200x200 image.

Also, applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 195x195 image will also output a 20x20 image.

So, I understand that there is kind of an ambiguity on the output shape when applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" (user might want output to be 195x195, or 200x200, or many other compatible shapes).

I assume that "the output shape is inferred." means that a default output shape is computed according to the parameters of the layer, and I assume that there is a mechanism to specify an output shape differnet from the default one, if necessary.

This said, I do not really understand

  • the meaning of the "output_padding" parameter

  • the interactions between parameters "padding" and "output_padding"

  • the various formulas in the function keras.conv_utils.deconv_length

Could someone explain this?

Many thanks,

Julien

like image 753
Julien REINAULD Avatar asked Feb 18 '19 16:02

Julien REINAULD


People also ask

What is the output shape of Conv2D?

You are confused about something else, a Conv2D layer outputs n feature maps, which is the number of kernels or filters, and the channels dimension is always equal to the number of output feature maps.

What is Conv2DTranspose in keras?

Conv2DTranspose is a convolution operation whose kernel is learnt (just like normal conv2d operation) while training your model. Using Conv2DTranspose will also upsample its input but the key difference is the model should learn what is the best upsampling for the job.

Why doesn't keras conv2d_transpose work with none?

Because that shape is passed directly to keras.backend.conv2d_transpose which in turn calls tf.nn.conv2d_transpose if you are using Tensorflow as the backend, and it doesn't accept None in any dimension of output shape.

What is the result of a keras conv2d convolution?

(5) The result is a Keras Conv2D convolution of a specified (3,3) filter on a (32,32,3) image produces a (32,32) result because the actual filter used is (3,3,3).

Can conv2dtranspose infer the shape of a transposed convolution?

Currently, Conv2DTranspose infers the shape of the output using deconv_length but because the output shape of a transposed convolution is ambigous it can infer an undesired shape. From an input shape (None, 12, 12, 16) a transposed convolution can output either (None, 24, 24, 1) or (None, 23, 23, 1).

What is padding in keras?

In Keras, padding parameter can be one of two strings: “valid” or “same”. When padding is “valid”, it means no zero-padding is implemented. When padding is “same”, the input-layer is padded in a way so that the output layer has a shape of the input shape divided by the stride.


2 Answers

I may have found a (partial) answer.

I found it in the Pytorch documentation, which appears to be much clearer than the Keras documentation on this topic.

When applying Conv2D with a stride greater than 1 to images which dimensions are close, we get output images with the same dimensions.

For instance, when applied a Conv2D with kernel size of 3x3, stride of 7x7 and padding "same", the following image dimensions

22x22, 23x23, ..., 28x28, 22x28, 28x22, 27x24, etc. (7x7 = 49 combinations)

will ALL yield an output dimension of 4x4.

That is because output_dimension = ceiling(input_dimension / stride).

As a consequence, when applying a Conv2DTranspose with kernel size of 3x3, stride of 7x7 and padding "same", there is an ambiguity about the output dimension.

Any of the 49 possible output dimensions would be correct.

The parameter output_padding is a way to resolve the ambiguity by choosing explicitly the output dimension.

In my example, the minimum output size is 22x22, and output_padding provides a number of lines (between 0 and 6) to add at the bottom of the output image and a number of columns (between 0 and 6) to add at the right of the output image.

So I can get output_dimensions = 24x25 if I use outout_padding = (2, 3)

What I still do not understand, however, is the logic that keras uses to choose a certain output image dimension when output_padding is not specified (when it 'infers" the output shape)

A few pointers:

https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d https://discuss.pytorch.org/t/the-output-size-of-convtranspose2d-differs-from-the-expected-output-size/1876/5 https://discuss.pytorch.org/t/question-about-the-output-padding-in-nn-convtrasnpose2d/19740 https://discuss.pytorch.org/t/what-does-output-padding-exactly-do-in-convtranspose2d/2688

So to answer my own questions:

  • the meaning of the "output_padding" parameter: see above
  • the interactions between parameters "padding" and "output_padding": these parameters are independant
  • the various formulas in the function keras.conv_utils.deconv_length
    • For now, I do not understand the part when output_padding is None;
    • I ignore the case when padding == 'full' (not supported by Conv2DTranspose);
    • The formula for padding == 'valid' seems correct (can be computed by reversing the formula of Conv2D)
    • The formula for padding == 'same' seems incorrect to me, in case kernel_size is even. (As a matter of fact, keras crashes when trying to build a Conv2DTranspose layer with input_dimension = 5x5, kernel_size = 2x2, stride = 7x7 and padding = 'same'. It appears to me that there is a bug in keras, I will start another thread for this topic...)
like image 188
Julien REINAULD Avatar answered Dec 26 '22 21:12

Julien REINAULD


Outpadding in Conv2DTranspose is also what I am concerned about when designing an autoencoder.

Assume stride is always 1. Along the encoder path, for each convolution layer, I chose padding='valid', which means that if my input image is HXW, and the filter is sized mXn, the output of the layer will be (H-(m-1))X(W-(n-1)).

In the corresponding Con2DTranspose layer along the decoder path, if I use Theano, in order to resume the input size of its corresponding Con2D, I have to chose padding='full', and out_padding = None or 0 (no difference), which implies the input size will be expanded by [m-1, n-1] around it, that is, (m-1)/2 for top and bottom, and (n-1)/2 for left and right.

If I use tensorflow, I will have to choose padding = 'same', and out_padding = 2*((filter_size-1)//2), I think that is Keras' intended behaviour.

If stride is not 1, then you will have to calculate carefully how many output paddings are to be added.

In Conv2D out_size = floor(in_size+2*padding_size-filter_size)/stride+1)

If we choose padding = 'same', Keras will automatically set padding = (filter_size-1)/2; whilst if we choose 'valid', padding_size will be set 0, which is the convention of any N-D convolutions.

Conversely, in Con2DTranspose out_size = (in_size-1)*stride+filter_size-2*padding_size

where padding_size refers to how many pixels will actually be padded caused by 'padding' option and out_padding together. Based upon the discussion above, there is no 'full' option on tensorflow, we will have to use out_padding to resume the input size of its corresponding Con2D.

Could you try and see if it works properly and let me know, please?

So in summary, I think out_padding is used for facilitating different backends.

like image 22
Theron Avatar answered Dec 26 '22 22:12

Theron