understanding output shape of keras Conv2DTranspose

I am having a hard time understanding the output shape of keras.layers.Conv2DTranspose

Here is the prototype:

    strides=(1, 1),
    dilation_rate=(1, 1),

In the documentation (https://keras.io/layers/convolutional/), I read:

If output_padding is set to None (default), the output shape is inferred.

In the code (https://github.com/keras-team/keras/blob/master/keras/layers/convolutional.py), I read:

out_height = conv_utils.deconv_length(height,
                                      stride_h, kernel_h,
out_width = conv_utils.deconv_length(width,
                                     stride_w, kernel_w,
if self.data_format == 'channels_first':
    output_shape = (batch_size, self.filters, out_height, out_width)
    output_shape = (batch_size, out_height, out_width, self.filters)

and (https://github.com/keras-team/keras/blob/master/keras/utils/conv_utils.py):

def deconv_length(dim_size, stride_size, kernel_size, padding, output_padding, dilation=1):

    """Determines output length of a transposed convolution given input length.
    # Arguments
        dim_size: Integer, the input length.
        stride_size: Integer, the stride along the dimension of `dim_size`.
        kernel_size: Integer, the kernel size along the dimension of `dim_size`.
        padding: One of `"same"`, `"valid"`, `"full"`.
        output_padding: Integer, amount of padding along the output dimension, can be set to `None` in which case the output length is inferred.
        dilation: dilation rate, integer.
    # Returns
        The output length (integer).

    assert padding in {'same', 'valid', 'full'}
    if dim_size is None:
        return None

    # Get the dilated kernel size
    kernel_size = kernel_size + (kernel_size - 1) * (dilation - 1)

    # Infer length if output padding is None, else compute the exact length
    if output_padding is None:
        if padding == 'valid':
            dim_size = dim_size * stride_size + max(kernel_size - stride_size, 0)
        elif padding == 'full':
            dim_size = dim_size * stride_size - (stride_size + kernel_size - 2)
        elif padding == 'same':
            dim_size = dim_size * stride_size
        if padding == 'same':
            pad = kernel_size // 2
        elif padding == 'valid':
            pad = 0
        elif padding == 'full':
            pad = kernel_size - 1

        dim_size = ((dim_size - 1) * stride_size + kernel_size - 2 * pad + output_padding)

    return dim_size

I understand that Conv2DTranspose is kind of a Conv2D, but reversed.

Since applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 200x200 image will output a 20x20 image, I assume that applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 20x20 image will output a 200x200 image.

Also, applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 195x195 image will also output a 20x20 image.

So, I understand that there is kind of an ambiguity on the output shape when applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" (user might want output to be 195x195, or 200x200, or many other compatible shapes).

I assume that "the output shape is inferred." means that a default output shape is computed according to the parameters of the layer, and I assume that there is a mechanism to specify an output shape differnet from the default one, if necessary.

This said, I do not really understand

  • the meaning of the "output_padding" parameter

  • the interactions between parameters "padding" and "output_padding"

  • the various formulas in the function keras.conv_utils.deconv_length

Could someone explain this?

Many thanks,


2 Answers

I may have found a (partial) answer.

I found it in the Pytorch documentation, which appears to be much clearer than the Keras documentation on this topic.

When applying Conv2D with a stride greater than 1 to images which dimensions are close, we get output images with the same dimensions.

For instance, when applied a Conv2D with kernel size of 3x3, stride of 7x7 and padding "same", the following image dimensions

22x22, 23x23, ..., 28x28, 22x28, 28x22, 27x24, etc. (7x7 = 49 combinations)

will ALL yield an output dimension of 4x4.

That is because output_dimension = ceiling(input_dimension / stride).

As a consequence, when applying a Conv2DTranspose with kernel size of 3x3, stride of 7x7 and padding "same", there is an ambiguity about the output dimension.

Any of the 49 possible output dimensions would be correct.

The parameter output_padding is a way to resolve the ambiguity by choosing explicitly the output dimension.

In my example, the minimum output size is 22x22, and output_padding provides a number of lines (between 0 and 6) to add at the bottom of the output image and a number of columns (between 0 and 6) to add at the right of the output image.

So I can get output_dimensions = 24x25 if I use outout_padding = (2, 3)

What I still do not understand, however, is the logic that keras uses to choose a certain output image dimension when output_padding is not specified (when it 'infers" the output shape)

A few pointers:

https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d https://discuss.pytorch.org/t/the-output-size-of-convtranspose2d-differs-from-the-expected-output-size/1876/5 https://discuss.pytorch.org/t/question-about-the-output-padding-in-nn-convtrasnpose2d/19740 https://discuss.pytorch.org/t/what-does-output-padding-exactly-do-in-convtranspose2d/2688

So to answer my own questions:

  • the meaning of the "output_padding" parameter: see above
  • the interactions between parameters "padding" and "output_padding": these parameters are independant
  • the various formulas in the function keras.conv_utils.deconv_length
    • For now, I do not understand the part when output_padding is None;
    • I ignore the case when padding == 'full' (not supported by Conv2DTranspose);
    • The formula for padding == 'valid' seems correct (can be computed by reversing the formula of Conv2D)
    • The formula for padding == 'same' seems incorrect to me, in case kernel_size is even. (As a matter of fact, keras crashes when trying to build a Conv2DTranspose layer with input_dimension = 5x5, kernel_size = 2x2, stride = 7x7 and padding = 'same'. It appears to me that there is a bug in keras, I will start another thread for this topic...)
Outpadding in Conv2DTranspose is also what I am concerned about when designing an autoencoder.

Assume stride is always 1. Along the encoder path, for each convolution layer, I chose padding='valid', which means that if my input image is HXW, and the filter is sized mXn, the output of the layer will be (H-(m-1))X(W-(n-1)).

In the corresponding Con2DTranspose layer along the decoder path, if I use Theano, in order to resume the input size of its corresponding Con2D, I have to chose padding='full', and out_padding = None or 0 (no difference), which implies the input size will be expanded by [m-1, n-1] around it, that is, (m-1)/2 for top and bottom, and (n-1)/2 for left and right.

If I use tensorflow, I will have to choose padding = 'same', and out_padding = 2*((filter_size-1)//2), I think that is Keras' intended behaviour.

If stride is not 1, then you will have to calculate carefully how many output paddings are to be added.

In Conv2D out_size = floor(in_size+2*padding_size-filter_size)/stride+1)

If we choose padding = 'same', Keras will automatically set padding = (filter_size-1)/2; whilst if we choose 'valid', padding_size will be set 0, which is the convention of any N-D convolutions.

Conversely, in Con2DTranspose out_size = (in_size-1)*stride+filter_size-2*padding_size

where padding_size refers to how many pixels will actually be padded caused by 'padding' option and out_padding together. Based upon the discussion above, there is no 'full' option on tensorflow, we will have to use out_padding to resume the input size of its corresponding Con2D.

Could you try and see if it works properly and let me know, please?

So in summary, I think out_padding is used for facilitating different backends.

