I was looking at this printout of layers. I realized, this shows input / output, but nothing about how the RGB channels are dealt with.
If you look at block1_conv1, it says "Conv2D". But if the input is 224 x 224 x 3, then that's not 2D.
By my bigger, broader question, is how are 3 channel inputs treated throughout the course of training a model like this (I think it's VGG16). Are the RGB channels combined (summed, or concatenated) at some point? When and where? Does that require some unique filter for that purpose? Or does the model run across the different channel/color representations separately from end to end?
The "2D" part of a 2D convolution does not refer to the dimension of the convolution input, nor of dimension of the filter itself, but rather of the space in which the filter is allowed to move (2 directions only). Another way of thinking about this is that each of the RGB channels has its own 2D-array filter separately and the output is added at the end.
does the model run across the different channel/color representations separately from end to end?
Effectively it does this across each channel separately. For example, the first Conv2D
layer takes in each of the 3 224x224 layers separately, and then applies different 2D-array filters to each one. But this isn't end-to-end across all model layers, only within the layer during the convolution step.
But, you might ask, there are 64 convolution filters for each channel, so why are there not 3*64 = 192
channels in the Conv2D
output for the 3 channels? This prompts your question
Are the RGB channels combined (summed, or concatenated) at some point?
The answer is: yes. After the convolution filter is applied to each layer separately, the values for each of the three channels are added, and then a bias too if you've specified that. See the diagram below (from Dive Into Deep Learning, under CC BY-SA 4.0):
The reason for this (that the separate channel layers are added) is that there aren't actually 3 separate 2D-array filters for each channel; technically there's just 1 3D-array filter which only moves in two directions. You could think of it a bit like a hamburger: there's one 2D-array filter for one channel (the bun), another for the next channel (the lettuce), etc. but all the layers are stacked up and function as a whole, so the filter weights are added all together at once. There's no need to add a special weight or filter for this, since the weights are already present during the convolution step (this would just be multiplying two fitting parameters together, which may as well be one).
This is a good resource.
If you look at block1_conv1, it says "Conv2D". But if the input is 224 x 224 x 3, then that's not 2D.
You misunderstand the meaning of 2D convolution. The 2D convolutional moves across your input along the height and width of each channel. It never gets moved in the 3rd dimension. Because it moves in this plane, it is a 2D convoltuion. The actual filter is 3D, yes.
by the end of the ConvNet architecture we will reduce the full image into a single vector of class scores. Stanford CS231n's website
The outputs of each layer of the filter convolved with its respective input layer is matrix summed. A bias is added to all elements, if you want. You might not. So, if you have 3 input channels and 1 filter, you get X-Y-1. If you have 2 (or 1, or 3, or 4 or 1000) input channels and 15 filters, you just get X-Y-15. X and Y are the output heights which will depend on other parameters of the convolution.
3D convolutions are different, in that each filter is 3D and moves across different channels, so each weight is used in all channels.
Its a trick used to reduce the computational cost of a network, by reducing the number of channels (reducing the dimensionality of the input), without doing any 'real' convolution/ calculations. This is useful because fewer channels means fewer weights need to be calculated for computation that uses this output as input, such as bigger (3x3 or 5x5) convolutions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With