At the moment I have some networks doing classification stuff with greyscaled images. I want to move on to colored (RGB) images.
In the CIFAR-10 tutorial of Tensorflow I got confused by the weights for the convolution kernels. The first convolution there looks like this:
kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
stddev=1e-4, wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
So it is a 5x5
convolution with an input of 3 (one for each color channel: the red, green and blue image information) and it is generating 64 feature maps.
However, the second convolution layer takes an input of 64 feature maps:
kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 64],
stddev=1e-4, wd=0.0)
conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
...so, how does this process the color information? Does this mean the different color channels are somehow spread on the 64 feature maps of convolution layer 1?
I thought conv layer 1 produces 64 feature maps for each color channel, therefore ending up in 3 * 64 = 196 feature maps...but obviously I were wrong.
How is the color information mixed there in conv layer 1?
See equation 3 in description of CuDNN here
Basically for a single example (n
), single row (p
) and single column (q
), the result of spatial convolution will be a weighted sum of 5x5x3
values. So each activation will contain information from all 3 colors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With