Currently trying to work my way through the Tensorflow MNIST tutorial for convolutional networks and I could use some help with understanding the dimensions of the darn tensors.
So we have images of 28x28
pixels in size.
The convolution will compute 32 features for each 5x5 patch.
Let's just accept this, for now, and ask ourselves later why 32 features and why 5x5 patches.
Its weight tensor will have a shape of
[5, 5, 1, 32]
. The first two dimensions are the patch size, the next is the number of input channels, and the last is the number of output channels.
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
If you say so ...
To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions corresponding to image width and height, and the final dimension corresponding to the number of color channels.
x_image = tf.reshape(x, [-1,28,28,1])
Alright, now I'm getting lost.
Judging by this last reshape, we have
"howevermany" 28x28x1
"blocks" of pixels that are our images.
I guess this makes sense because the images are in greyscale
However, if that is the ordering, then our weight tensor is essentially a collection of five 5x1x32
"blocks" of values.
The x32
makes sense, I guess, if we want to infer 32
features per patch
The rest, though, I'm not terribly convinced by.
Why does the weight tensor look the way it apparently does?
(For completeness: we use them
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
where
def conv2d(x,W):
'''
2D convolution, expects 4D input x and filter matrix W
'''
return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding ='SAME')
def max_pool_2x2(x):
'''
max-pooling, using 2x2 patches
'''
return tf.nn.max_pool(x,ksize=[1,2,2,1], strides=[1,2,2,1],padding='SAME')
)
Your input tensor has the shape [-1,28,28,1]
. Like you mention, the last dimension is 1 because the images are in greyscale. The first index is the batchsize. The convolution will process every image in the batch independently, therefore the batchsize has no influence on the convolution-weight-tensor dimensions, or, in fact, no influence on any weight-tensor dimensions in the network. That is why the batchsize can be arbitrary (-1
signifies arbitrary size in tensorflow).
Now to the weight tensor; you don't have five of 5x1x32
-blocks, you rather have 32 of 5x5x1
-blocks. Each represents one feature. The 1 is the depth of the patch and is 1 due to the gray scale (it would be 5x5x3x32
for color images). The 5x5
is the size of the patch.
The ordering of dimensions in the data tensors is different from the ordering of dimensions in the convolution weight tensors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With