I noticed in a number of places that people use something like this, usually in fully convolutional networks, autoencoders, and similar:
model.add(UpSampling2D(size=(2,2)))
model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(1,1))
I am wondering what is the difference between that and simply:
model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(2,2))
Links towards any papers that explain this difference are welcome.
Two common types of layers that can be used in the generator model are a upsample layer (UpSampling2D) that simply doubles the dimensions of the input and the transpose convolutional layer (Conv2DTranspose) that performs an inverse convolution operation.
It is also known as upsampled convolution which is intuitive to the task it is used to perform, i.e upsample the input feature map. It is also referred to as fractionally strided convolution due since stride over the output is equivalent to fractional stride over the input.
Its role is to bring back the resolution to the resolution of previous layer. Theoretically, we can eliminate the down/up sampling layers altogether. However to reduce the number of computations, we can downsample the input before a layers and then upsample its output.
A strided convolution is another basic building block of convolution that is used in Convolutional Neural Networks. Let's say we want to convolve this 7 \times 7 image with this 3 \times 3 filter, except, that instead of doing it the usual way, we're going to do it with a stride of 2 . Convolutions with a stride of two.
Here and here you can find a really nice explanation of how transposed convolutions work. To sum up both of these approaches:
In your first approach, you are first upsampling your feature map:
[[1, 2], [3, 4]] -> [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]
and then you apply a classical convolution (as Conv2DTranspose
with stride=1
and padding='same'
is equivalent to Conv2D
).
In your second approach you are first un(max)pooling your feature map:
[[1, 2], [3, 4]] -> [[1, 0, 2, 0], [0, 0, 0, 0], [3, 0, 4, 0], [0, 0, 0, 0]]
and then apply a classical convolution with filter_size
, filters`, etc.
Fun fact is that - although these approaches are different they share something in common. Transpose convolution is meant to be the approximation of gradient of convolution, so the first approach is approximating sum pooling
whereas second max pooling
gradient. This makes the first results to produce slightly smoother results.
Other reasons why you might see the first approach are:
Conv2DTranspose
(and its equivalents) are relatively new in keras
so the only way to perform learnable upsampling was using Upsample2D
,keras
- Francois Chollet used this approach in one of his tutorials,keras
due to some API
inconsistencies.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With