Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weights in Convolutional network?

I am currently following the TensorFlow's Multilayer Convolutional Network tutorial.

In various layers weight is initialised as followed :

  • First Convolutional Layer:

    W_conv1 = weight_variable([5, 5, 1, 32])
    
  • Second Convolutional Layer:

    W_conv2 = weight_variable([5, 5, 32, 64])
    
  • Densely Connected Layer:

    W_fc1 = weight_variable([7 * 7 * 64, 1024])
    
  • Readout Layer:

    W_fc2 = weight_variable([1024, 10])
    

So I am having doubts in how is the shape of the above weight variables known to us ?

Is their any math used to find the shape for them ?

like image 968
turtle Avatar asked Jan 12 '16 13:01

turtle


1 Answers

The answer is explained on the same page:

The convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]

There is no involved math par say, but these terms need explanation

  1. The size of convolution kernel is 5X5. That means there is a 5X5 matrix that is convolved with an input image by moving it around the image. Check this link for an explanation of how a small 5X5 matrix moves over a 28X28 image and multiplies different cells of the image matrix with itself. This gives us first two dimentsions of [5, 5, 1, 32]
  2. The size of input channels is 1. These are BW images, hence one input channel. Most colored images have 3 channels, so expect a 3 in some other convolution networks working on images. Indeed, for the second layer, W_conv2, the number of input channels is 32, same as number of output channels of layer 1.
  3. The last dimension of the weight matrix is perhaps hardest to visualize. Imagine your 5X5 matrix, and replicate it 32 times!. Each of these 32 things are called channels. To complete the discussion, each of these 32 5X5 matrices are initialized with random weights and trained independently during forward/back propagation of the network. More channels learn different aspects of the image and hence give extra power to your network.

If you summarize these 3 points, you get desired dimensions of layer 1. Subsequent layers are an extension - First two dimensions are kernel sizes (5X5) in this case. Third dimension is equal to size of input channel, which is equal to size of output channel of previous layer. (32, since we declared 32 output channels of layer 1). Final dimension is the size of output channel of current layer (64, even lager for second layer!. Again, keeping a large number of independent 5X5 kernels helps!).

Finally, last two layers: Final dense layer is the only thing that involves some calculation:

  1. For each convolution layer, final size = initial size
  2. For pooling layer of size kXk, final size = initial size / k

So,

  1. For conv1, size remains 28 X 28
  2. pool1 reduces size to 14 X 14
  3. For conv2, size remains 14 X 14
  4. pool2 reduces size to 7 X 7

And ofcourse, we have 64 channels due to conv2 - pooling doesn't affect them. Hence, we get a final dense input of 7X7X64. We then create fully connected 1024 hidden layers and add 10 output classes for 10 digits.

like image 128
Sudeep Juvekar Avatar answered Nov 11 '22 19:11

Sudeep Juvekar