I am currently following the TensorFlow's Multilayer Convolutional Network tutorial.
In various layers weight is initialised as followed :
First Convolutional Layer:
W_conv1 = weight_variable([5, 5, 1, 32])
Second Convolutional Layer:
W_conv2 = weight_variable([5, 5, 32, 64])
Densely Connected Layer:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
Readout Layer:
W_fc2 = weight_variable([1024, 10])
So I am having doubts in how is the shape of the above weight variables known to us ?
Is their any math used to find the shape for them ?
The answer is explained on the same page:
The convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]
There is no involved math par say, but these terms need explanation
5X5
. That means there is a 5X5
matrix that is convolved with an input image by moving it around the image. Check this link for an explanation of how a small 5X5
matrix moves over a 28X28
image and multiplies different cells of the image matrix with itself. This gives us first two dimentsions of [5, 5, 1, 32]
1
. These are BW images, hence one input channel. Most colored images have 3 channels, so expect a 3
in some other convolution networks working on images. Indeed, for the second layer, W_conv2
, the number of input channels is 32
, same as number of output channels of layer 1.5X5
matrix, and replicate it 32 times!. Each of these 32 things are called channels
. To complete the discussion, each of these 32 5X5
matrices are initialized with random weights and trained independently during forward/back propagation of the network. More channels learn different aspects of the image and hence give extra power to your network.If you summarize these 3 points, you get desired dimensions of layer 1. Subsequent layers are an extension - First two dimensions are kernel sizes (5X5) in this case. Third dimension is equal to size of input channel, which is equal to size of output channel of previous layer. (32, since we declared 32 output channels of layer 1). Final dimension is the size of output channel of current layer (64, even lager for second layer!. Again, keeping a large number of independent 5X5 kernels helps!).
Finally, last two layers: Final dense layer is the only thing that involves some calculation:
So,
28 X 28
14 X 14
14 X 14
7 X 7
And ofcourse, we have 64
channels due to conv2 - pooling doesn't affect them. Hence, we get a final dense input of 7X7X64
. We then create fully connected 1024
hidden layers and add 10
output classes for 10
digits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With