Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between performing upsampling together with strided transpose convolution and transpose convolution with stride 1 only?

I noticed in a number of places that people use something like this, usually in fully convolutional networks, autoencoders, and similar:

model.add(UpSampling2D(size=(2,2)))
model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(1,1))

I am wondering what is the difference between that and simply:

model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(2,2))

Links towards any papers that explain this difference are welcome.

like image 756
Aleksandar Jovanovic Avatar asked Jan 12 '18 12:01

Aleksandar Jovanovic


People also ask

What is the difference between UpSampling2D and Conv2DTranspose?

Two common types of layers that can be used in the generator model are a upsample layer (UpSampling2D) that simply doubles the dimensions of the input and the transpose convolutional layer (Conv2DTranspose) that performs an inverse convolution operation.

What is upsampling convolution?

It is also known as upsampled convolution which is intuitive to the task it is used to perform, i.e upsample the input feature map. It is also referred to as fractionally strided convolution due since stride over the output is equivalent to fractional stride over the input.

Why is upsampling used in CNN?

Its role is to bring back the resolution to the resolution of previous layer. Theoretically, we can eliminate the down/up sampling layers altogether. However to reduce the number of computations, we can downsample the input before a layers and then upsample its output.

What is Strided convolution?

A strided convolution is another basic building block of convolution that is used in Convolutional Neural Networks. Let's say we want to convolve this 7 \times 7 image with this 3 \times 3 filter, except, that instead of doing it the usual way, we're going to do it with a stride of 2 . Convolutions with a stride of two.


1 Answers

Here and here you can find a really nice explanation of how transposed convolutions work. To sum up both of these approaches:

  1. In your first approach, you are first upsampling your feature map:

    [[1, 2], [3, 4]] -> [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]
    

    and then you apply a classical convolution (as Conv2DTranspose with stride=1 and padding='same' is equivalent to Conv2D).

  2. In your second approach you are first un(max)pooling your feature map:

    [[1, 2], [3, 4]] -> [[1, 0, 2, 0], [0, 0, 0, 0], [3, 0, 4, 0], [0, 0, 0, 0]]
    

    and then apply a classical convolution with filter_size, filters`, etc.

    enter image description here

Fun fact is that - although these approaches are different they share something in common. Transpose convolution is meant to be the approximation of gradient of convolution, so the first approach is approximating sum pooling whereas second max pooling gradient. This makes the first results to produce slightly smoother results.

Other reasons why you might see the first approach are:

  • Conv2DTranspose (and its equivalents) are relatively new in keras so the only way to perform learnable upsampling was using Upsample2D,
  • Author of keras - Francois Chollet used this approach in one of his tutorials,
  • In the past equivalents of transpose, convolution seemed to work awful in keras due to some API inconsistencies.
like image 97
Marcin Możejko Avatar answered Nov 15 '22 12:11

Marcin Możejko