when i learn the deep mnist with the tensorflow tutorial, i have a problem about the output size after convolving and pooling to the input image. In tutorials we can see:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
We then convolve x_image with the weight tensor, add the bias, apply 
the ReLU function, and finally max pool. The max_pool_2x2 method 
will reduce the image size to 14x14.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
I think there are two steps to handle the input image: fisrt convolution and second max pool?! After convolution, the output size is (28-5+1)*(28-5+1) = 24*24. Then the size of input to max pooling is 24*24. if the pool size is 2*2, the output size is (24/2)*(24/2) = 12*12 rather than 14*14. Does that make sense? pleae tell me the detail about how to calculate the output size after convolution and pooling. Thanks a lot. The following image is the process of the CNN in a paper. image of the CNN process
I have already understood where the problem is.
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
The padding = 'SAME' means the output size is same with the input size----image size. Then after convolution ,the output size is 28*28, and the finally output size is (28/2)*(28/2) = 14*14 after pooling. But how to explain the following code about the padding = 'SAME':
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                    strides=[1, 2, 2, 1], padding='SAME')
Lets take an example.
Tensor size or shape: (width = 28, height = 28)
Convolution filter size (F): (F_width = 5, F_height = 5)
Padding (P): 0
Padding algorithm: VALID (this means that the output size can vary)
Stride (S): 1
Using the equation:
output width=((W-F+2*P )/S)+1
output width= ((28-5+2*0)/1) + 1
output width = 24
The same answer will be valid for the output height considering that they have the same dimension.
So the output dimension will be (24,24).
However, if the padding algorithm is set to "same", the size of the output is equal to the size of the original input.
Let also remember that a pooling is a form of "filter" and thus the above filer equation is a aplicable.
So a 2x2 pooling with stride of 2, using the same equation (((W-F+2*P )/S)+1) will give us:
= ((28-2+2*0)/2) + 1 = (26/2)+1 = (13)+1 = 14
Here is a link to the answer I once posted to Quora.
https://www.quora.com/How-can-I-calculate-the-size-of-output-of-convolutional-layer/answer/Rockson-Agyeman
The output size of a convolutional layer depends on the padding algorithm used. As you can see in the "Convolution and Pooling" section, in the tutorial, they use the same method of padding. That means that the output shape is the same as the input shape and the input is padded with zeros outside the original input.
Your estimate for the output shape is true when you use the valid padding algorithm.
If you are using tensorflow, you can find more detailed discussion here: What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With