Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensorflow - understanding tensor shapes for convolution

Currently trying to work my way through the Tensorflow MNIST tutorial for convolutional networks and I could use some help with understanding the dimensions of the darn tensors.

So we have images of 28x28 pixels in size.

The convolution will compute 32 features for each 5x5 patch.

Let's just accept this, for now, and ask ourselves later why 32 features and why 5x5 patches.

Its weight tensor will have a shape of [5, 5, 1, 32]. The first two dimensions are the patch size, the next is the number of input channels, and the last is the number of output channels.

W_conv1 = weight_variable([5, 5, 1, 32])

b_conv1 = bias_variable([32])

If you say so ...

To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions corresponding to image width and height, and the final dimension corresponding to the number of color channels.

x_image = tf.reshape(x, [-1,28,28,1])

Alright, now I'm getting lost.

Judging by this last reshape, we have "howevermany" 28x28x1 "blocks" of pixels that are our images.

I guess this makes sense because the images are in greyscale

However, if that is the ordering, then our weight tensor is essentially a collection of five 5x1x32 "blocks" of values.

The x32 makes sense, I guess, if we want to infer 32 features per patch

The rest, though, I'm not terribly convinced by.

Why does the weight tensor look the way it apparently does?

(For completeness: we use them

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

where

def conv2d(x,W):
    '''
    2D convolution, expects 4D input x and filter matrix W
    '''
    return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding ='SAME')

def max_pool_2x2(x):
    '''
    max-pooling, using 2x2 patches
    '''
    return tf.nn.max_pool(x,ksize=[1,2,2,1], strides=[1,2,2,1],padding='SAME')

)

like image 712
User1291 Avatar asked Jan 05 '23 01:01

User1291


1 Answers

Your input tensor has the shape [-1,28,28,1]. Like you mention, the last dimension is 1 because the images are in greyscale. The first index is the batchsize. The convolution will process every image in the batch independently, therefore the batchsize has no influence on the convolution-weight-tensor dimensions, or, in fact, no influence on any weight-tensor dimensions in the network. That is why the batchsize can be arbitrary (-1 signifies arbitrary size in tensorflow).

Now to the weight tensor; you don't have five of 5x1x32-blocks, you rather have 32 of 5x5x1-blocks. Each represents one feature. The 1 is the depth of the patch and is 1 due to the gray scale (it would be 5x5x3x32 for color images). The 5x5 is the size of the patch.

The ordering of dimensions in the data tensors is different from the ordering of dimensions in the convolution weight tensors.

like image 151
BlueSun Avatar answered Jan 14 '23 12:01

BlueSun