Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are the output size of MaxPooling2D, Conv2D, UpSampling2D layers calculated?

I'm learning about convolutional autoencoders and I am using keras to build a image denoiser. The following code works for building a model:

denoiser.add(Conv2D(32, (3,3), input_shape=(28,28,1), padding='same')) 
denoiser.add(Activation('relu'))
denoiser.add(MaxPooling2D(pool_size=(2,2)))

denoiser.add(Conv2D(16, (3,3), padding='same'))
denoiser.add(Activation('relu'))
denoiser.add(MaxPooling2D(pool_size=(2,2)))

denoiser.add(Conv2D(8, (3,3), padding='same'))
denoiser.add(Activation('relu'))

################## HEY WHAT NO MAXPOOLING?

denoiser.add(Conv2D(8, (3,3), padding='same'))
denoiser.add(Activation('relu'))
denoiser.add(UpSampling2D((2,2)))

denoiser.add(Conv2D(16, (3,3), padding='same'))
denoiser.add(Activation('relu'))
denoiser.add(UpSampling2D((2,2)))

denoiser.add(Conv2D(1, (3,3), padding='same'))

denoiser.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
denoiser.summary()

And the following summary is given:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_155 (Conv2D)          (None, 28, 28, 32)        320       
_________________________________________________________________
activation_162 (Activation)  (None, 28, 28, 32)        0         
_________________________________________________________________
max_pooling2d_99 (MaxPooling (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_156 (Conv2D)          (None, 14, 14, 16)        4624      
_________________________________________________________________
activation_163 (Activation)  (None, 14, 14, 16)        0         
_________________________________________________________________
max_pooling2d_100 (MaxPoolin (None, 7, 7, 16)          0         
_________________________________________________________________
conv2d_157 (Conv2D)          (None, 7, 7, 8)           1160      
_________________________________________________________________
activation_164 (Activation)  (None, 7, 7, 8)           0         
_________________________________________________________________
conv2d_158 (Conv2D)          (None, 7, 7, 8)           584       
_________________________________________________________________
activation_165 (Activation)  (None, 7, 7, 8)           0         
_________________________________________________________________
up_sampling2d_25 (UpSampling (None, 14, 14, 8)         0         
_________________________________________________________________
conv2d_159 (Conv2D)          (None, 14, 14, 16)        1168      
_________________________________________________________________
activation_166 (Activation)  (None, 14, 14, 16)        0         
_________________________________________________________________
up_sampling2d_26 (UpSampling (None, 28, 28, 16)        0         
_________________________________________________________________
conv2d_160 (Conv2D)          (None, 28, 28, 1)         145       
=================================================================
Total params: 8,001
Trainable params: 8,001
Non-trainable params: 0
_________________________________________________________________

I am not sure how MaxPooling2D, Conv2D, UpSampling2D output sizes are calculated. I have read the keras documentation but I am still confused. There are many parameters that affect the output shape, like stride or padding for Conv2D layers, and I do not know how exactly it affects the output shape.

I do not get why there is no MaxPooling2D layer before the commented line. Editing the code to include a convmodel3.add(MaxPooling2D(pool_size=(2,2))) layer above the comment, it turns the final output shape to (None, 12, 12, 1)

Editing the code to include a convmodel3.add(MaxPooling2D(pool_size=(2,2))) layer before the comment, and then an convmodel3.add(UpSampling2D((2,2))) turns the final output to (None, 24, 24, 1). Shouldn't this be a (None, 28, 28, 1)? The code and summary for this:

convmodel3 = Sequential()
convmodel3.add(Conv2D(32, (3,3), input_shape=(28,28,1), padding='same')) 
convmodel3.add(Activation('relu'))
convmodel3.add(MaxPooling2D(pool_size=(2,2)))

convmodel3.add(Conv2D(16, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(MaxPooling2D(pool_size=(2,2)))

convmodel3.add(Conv2D(8, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(MaxPooling2D(pool_size=(2,2))) # ADDED MAXPOOL

################## HEY WHAT NO MAXPOOLING?

convmodel3.add(UpSampling2D((2,2))) # ADDED UPSAMPLING
convmodel3.add(Conv2D(16, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(UpSampling2D((2,2)))

convmodel3.add(Conv2D(32, (3,3), padding='same'))
convmodel3.add(Activation('relu'))
convmodel3.add(UpSampling2D((2,2)))

convmodel3.add(Conv2D(1, (3,3), padding='same'))

convmodel3.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
convmodel3.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_247 (Conv2D)          (None, 28, 28, 32)        320       
_________________________________________________________________
activation_238 (Activation)  (None, 28, 28, 32)        0         
_________________________________________________________________
max_pooling2d_141 (MaxPoolin (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_248 (Conv2D)          (None, 14, 14, 16)        4624      
_________________________________________________________________
activation_239 (Activation)  (None, 14, 14, 16)        0         
_________________________________________________________________
max_pooling2d_142 (MaxPoolin (None, 7, 7, 16)          0         
_________________________________________________________________
conv2d_249 (Conv2D)          (None, 7, 7, 8)           1160      
_________________________________________________________________
activation_240 (Activation)  (None, 7, 7, 8)           0         
_________________________________________________________________
max_pooling2d_143 (MaxPoolin (None, 3, 3, 8)           0         
_________________________________________________________________
up_sampling2d_60 (UpSampling (None, 6, 6, 8)           0         
_________________________________________________________________
conv2d_250 (Conv2D)          (None, 6, 6, 16)          1168      
_________________________________________________________________
activation_241 (Activation)  (None, 6, 6, 16)          0         
_________________________________________________________________
up_sampling2d_61 (UpSampling (None, 12, 12, 16)        0         
_________________________________________________________________
conv2d_251 (Conv2D)          (None, 12, 12, 32)        4640      
_________________________________________________________________
activation_242 (Activation)  (None, 12, 12, 32)        0         
_________________________________________________________________
up_sampling2d_62 (UpSampling (None, 24, 24, 32)        0         
_________________________________________________________________
conv2d_252 (Conv2D)          (None, 24, 24, 1)         289       
=================================================================
Total params: 12,201
Trainable params: 12,201
Non-trainable params: 0
_________________________________________________________________

What is the significance of None in the output shape?

Also, editing the Conv2D layers to not include padding, an error is raised:

ValueError: Negative dimension size caused by subtracting 3 from 2 for 'conv2d_240/convolution' (op: 'Conv2D') with input shapes: [?,2,2,16], [3,3,16,32].

Why?

like image 577
Amp Avatar asked Jan 29 '19 14:01

Amp


People also ask

What is pool size in MaxPooling2D Pool_size?

Keras MaxPooling2D Arguments pool_size – This is a defined integer or tuple of two integers. This is defined as the window size which takes the maximum value into the pooling window which was 2*2. If suppose we have specified a single integer then the same length is used for all dimensions.

How do you calculate maximum pooling output?

Followed by a max-pooling layer, the method of calculating pooling layer is as same as the Conv layer. The kernel size of max-pooling layer is (2,2) and stride is 2, so output size is (28–2)/2 +1 = 14. After pooling, the output shape is (14,14,8).

How does MaxPooling2D work?

MaxPooling2D class. Max pooling operation for 2D spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the maximum value over an input window (of size defined by pool_size ) for each channel of the input. The window is shifted by strides along each dimension.

What is Maxpooling 2D layer?

A 2-D max pooling layer performs downsampling by dividing the input into rectangular pooling regions, then computing the maximum of each region.


1 Answers

With convolutional (2D here) layers, the important points to consider are the volume of the image (Width x Height x Depth) and the four parameters you give it. Those parameters are the

  • Number of filters K
  • Filter size (spatial) F
  • Stride at which filters move at S
  • Zero padding P

The formula for the output shape is given as

  1. Wnew = (W - F + 2*P)/S + 1
  2. Hnew = (H - F + 2*P)/S + 1
  3. Dnew = K

This is taken from this thread what is the effect of tf.nn.conv2d() on an input tensor shape? , and more information about zero padding and such can be found there.

As for maxpooling and upsampling, the size is just effected by the pool size and the stride. In your example, you had a pool size of (2,2) along with no stride defined (so it will be default to be the pool size, see here https://keras.io/layers/pooling/). Upsampling works the same. The pool size just takes a pool of 2x2 pixels, finds the sum of them and puts them into one pixel. Hence converting 2x2 pixels to 1x1 pixel, encoding it. Upsampling is the same thing, but instead of summing the pixel values, the values are just repeated over the pool.

The reason why you don't have a maxpooling layer and why the image dimensions mess up in your case is due to the image size at that stage. Looking at the network, the image dimensions is already [7,7,8]. With a pool size and stride of (2,2) and 2 respectively, that would lower the resolution of the image to [3,3,8]. After the upsampling layers, the dimensionality will go from 3 -> 6 -> 12 -> 24, and you've lost 4 pixels in each row and column.

The significance of None (correct me if I'm wrong I'm not 100% certain) is due to the network expecting multiple images normally at convolutional layers. Normally the dimensionality expected goes as

[Number of images, Width, Height, Depth]

Thus the reason why the first element is given as none is that your network is only expecting one image at a time, hence it's given as None (Again I'm very not sure about this point).

like image 95
ZWang Avatar answered Oct 28 '22 13:10

ZWang