While going through the autoencoder tutorial in Keras blog, I saw that the author uses same padding in max pooling layers in Convolutional Autoencoder part, as shown below.
x = MaxPooling2D((2, 2), padding='same')(x)
Could someone explain the reason behind this? With max pooling, we want to reduce the height and width but why is same padding, which keeps height and width the same, used here?
In addition, the result of this code halves the dimensions by 2, so the same padding doesn't seem to work.
The padding type is called SAME because the output size is the same as the input size(when stride=1). Using 'SAME' ensures that the filter is applied to all the elements of the input. Normally, padding is set to "SAME" while training the model. Output size is mathematically convenient for further computation.
max_pool returns an output of size 2x1. Output dimensions are calculated using the above formulas. There is no padding with the VALID option. Max pooling starts by placing the 2x2 filter over the input at (0,0) and selecting the maximum input value from the overlapping region.
"same" results in padding with zeros evenly to the left/right or up/down of the input. When padding="same" and strides=1 , the output has the same size as the input.
Same Padding: In this case, we add 'p' padding layers such that the output image has the same dimensions as the input image.
From https://keras.io/layers/convolutional/
"same" results in padding the input such that the output has the same length as the original input.
From https://keras.io/layers/pooling/
pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
So, first let's start by asking why use padding at all? In the convolutional kernel context it is important since we don't want to miss each pixel being at the "center" of the kernel. There could be important behavior at the edges/corners of the image that a kernel is looking for. So we pad around the edges for Conv2D and as a result it returns the same size output as the input.
However, in the case of the MaxPooling2D layer we are padding for similar reasons, but the stride size is affected by your choice of pooling size. Since your pooling size is 2, your image will be halved each time you go through a pooling layer.
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
So in the case of your tutorial example; your image dimensions will go from 28->14->7->4 with each arrow representing the pooling layer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With