How are the output size of MaxPooling2D, Conv2D, UpSampling2D layers calculated?

Tags:

I'm learning about convolutional autoencoders and I am using keras to build a image denoiser. The following code works for building a model:

denoiser.add(Conv2D(32, (3,3), input_shape=(28,28,1), padding='same')) 
denoiser.add(Activation('relu'))
denoiser.add(MaxPooling2D(pool_size=(2,2)))

denoiser.add(Conv2D(16, (3,3), padding='same'))
denoiser.add(Activation('relu'))
denoiser.add(MaxPooling2D(pool_size=(2,2)))

denoiser.add(Conv2D(8, (3,3), padding='same'))
denoiser.add(Activation('relu'))

################## HEY WHAT NO MAXPOOLING?

denoiser.add(Conv2D(8, (3,3), padding='same'))
denoiser.add(Activation('relu'))
denoiser.add(UpSampling2D((2,2)))

denoiser.add(Conv2D(16, (3,3), padding='same'))
denoiser.add(Activation('relu'))
denoiser.add(UpSampling2D((2,2)))

denoiser.add(Conv2D(1, (3,3), padding='same'))

denoiser.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
denoiser.summary()

And the following summary is given:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_155 (Conv2D)          (None, 28, 28, 32)        320       
_________________________________________________________________
activation_162 (Activation)  (None, 28, 28, 32)        0         
_________________________________________________________________
max_pooling2d_99 (MaxPooling (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_156 (Conv2D)          (None, 14, 14, 16)        4624      
_________________________________________________________________
activation_163 (Activation)  (None, 14, 14, 16)        0         
_________________________________________________________________
max_pooling2d_100 (MaxPoolin (None, 7, 7, 16)          0         
_________________________________________________________________
conv2d_157 (Conv2D)          (None, 7, 7, 8)           1160      
_________________________________________________________________
activation_164 (Activation)  (None, 7, 7, 8)           0         
_________________________________________________________________
conv2d_158 (Conv2D)          (None, 7, 7, 8)           584       
_________________________________________________________________
activation_165 (Activation)  (None, 7, 7, 8)           0         
_________________________________________________________________
up_sampling2d_25 (UpSampling (None, 14, 14, 8)         0         
_________________________________________________________________
conv2d_159 (Conv2D)          (None, 14, 14, 16)        1168      
_________________________________________________________________
activation_166 (Activation)  (None, 14, 14, 16)        0         
_________________________________________________________________
up_sampling2d_26 (UpSampling (None, 28, 28, 16)        0         
_________________________________________________________________
conv2d_160 (Conv2D)          (None, 28, 28, 1)         145       
=================================================================
Total params: 8,001
Trainable params: 8,001
Non-trainable params: 0
_________________________________________________________________

I am not sure how MaxPooling2D, Conv2D, UpSampling2D output sizes are calculated. I have read the keras documentation but I am still confused. There are many parameters that affect the output shape, like stride or padding for Conv2D layers, and I do not know how exactly it affects the output shape.

I do not get why there is no MaxPooling2D layer before the commented line. Editing the code to include a convmodel3.add(MaxPooling2D(pool_size=(2,2))) layer above the comment, it turns the final output shape to (None, 12, 12, 1)

Editing the code to include a convmodel3.add(MaxPooling2D(pool_size=(2,2))) layer before the comment, and then an convmodel3.add(UpSampling2D((2,2))) turns the final output to (None, 24, 24, 1). Shouldn't this be a (None, 28, 28, 1)? The code and summary for this:

convmodel3 = Sequential()
convmodel3.add(Conv2D(32, (3,3), input_shape=(28,28,1), padding='same')) 
convmodel3.add(Activation('relu'))
convmodel3.add(MaxPooling2D(pool_size=(2,2)))

convmodel3.add(Conv2D(16, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(MaxPooling2D(pool_size=(2,2)))

convmodel3.add(Conv2D(8, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(MaxPooling2D(pool_size=(2,2))) # ADDED MAXPOOL

################## HEY WHAT NO MAXPOOLING?

convmodel3.add(UpSampling2D((2,2))) # ADDED UPSAMPLING
convmodel3.add(Conv2D(16, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(UpSampling2D((2,2)))

convmodel3.add(Conv2D(32, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(UpSampling2D((2,2)))

convmodel3.add(Conv2D(1, (3,3), padding='same'))

convmodel3.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
convmodel3.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_247 (Conv2D)          (None, 28, 28, 32)        320       
_________________________________________________________________
activation_238 (Activation)  (None, 28, 28, 32)        0         
_________________________________________________________________
max_pooling2d_141 (MaxPoolin (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_248 (Conv2D)          (None, 14, 14, 16)        4624      
_________________________________________________________________
activation_239 (Activation)  (None, 14, 14, 16)        0         
_________________________________________________________________
max_pooling2d_142 (MaxPoolin (None, 7, 7, 16)          0         
_________________________________________________________________
conv2d_249 (Conv2D)          (None, 7, 7, 8)           1160      
_________________________________________________________________
activation_240 (Activation)  (None, 7, 7, 8)           0         
_________________________________________________________________
max_pooling2d_143 (MaxPoolin (None, 3, 3, 8)           0         
_________________________________________________________________
up_sampling2d_60 (UpSampling (None, 6, 6, 8)           0         
_________________________________________________________________
conv2d_250 (Conv2D)          (None, 6, 6, 16)          1168      
_________________________________________________________________
activation_241 (Activation)  (None, 6, 6, 16)          0         
_________________________________________________________________
up_sampling2d_61 (UpSampling (None, 12, 12, 16)        0         
_________________________________________________________________
conv2d_251 (Conv2D)          (None, 12, 12, 32)        4640      
_________________________________________________________________
activation_242 (Activation)  (None, 12, 12, 32)        0         
_________________________________________________________________
up_sampling2d_62 (UpSampling (None, 24, 24, 32)        0         
_________________________________________________________________
conv2d_252 (Conv2D)          (None, 24, 24, 1)         289       
=================================================================
Total params: 12,201
Trainable params: 12,201
Non-trainable params: 0
_________________________________________________________________

What is the significance of None in the output shape?

Also, editing the Conv2D layers to not include padding, an error is raised:

ValueError: Negative dimension size caused by subtracting 3 from 2 for 'conv2d_240/convolution' (op: 'Conv2D') with input shapes: [?,2,2,16], [3,3,16,32].

Why?

577

asked Jan 29 '19 14:01

Amp

1 Answers

With convolutional (2D here) layers, the important points to consider are the volume of the image (Width x Height x Depth) and the four parameters you give it. Those parameters are the

Number of filters K
Filter size (spatial) F
Stride at which filters move at S
Zero padding P

The formula for the output shape is given as

Wnew = (W - F + 2*P)/S + 1
Hnew = (H - F + 2*P)/S + 1
Dnew = K

This is taken from this thread what is the effect of tf.nn.conv2d() on an input tensor shape? , and more information about zero padding and such can be found there.

As for maxpooling and upsampling, the size is just effected by the pool size and the stride. In your example, you had a pool size of (2,2) along with no stride defined (so it will be default to be the pool size, see here https://keras.io/layers/pooling/). Upsampling works the same. The pool size just takes a pool of 2x2 pixels, finds the sum of them and puts them into one pixel. Hence converting 2x2 pixels to 1x1 pixel, encoding it. Upsampling is the same thing, but instead of summing the pixel values, the values are just repeated over the pool.

The reason why you don't have a maxpooling layer and why the image dimensions mess up in your case is due to the image size at that stage. Looking at the network, the image dimensions is already [7,7,8]. With a pool size and stride of (2,2) and 2 respectively, that would lower the resolution of the image to [3,3,8]. After the upsampling layers, the dimensionality will go from 3 -> 6 -> 12 -> 24, and you've lost 4 pixels in each row and column.

The significance of None (correct me if I'm wrong I'm not 100% certain) is due to the network expecting multiple images normally at convolutional layers. Normally the dimensionality expected goes as

[Number of images, Width, Height, Depth]

Thus the reason why the first element is given as none is that your network is only expecting one image at a time, hence it's given as None (Again I'm very not sure about this point).

answered Oct 28 '22 13:10

ZWang

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How are the output size of MaxPooling2D, Conv2D, UpSampling2D layers calculated?

Tags:

python

deep-learning

keras

autoencoder

Amp

People also ask

1 Answers

ZWang

Recent Activity

Donate For Us

How are the output size of MaxPooling2D, Conv2D, UpSampling2D layers calculated?

Tags:

python

deep-learning

keras

autoencoder

Amp

People also ask

1 Answers

ZWang

Related questions

Recent Activity

Donate For Us