I've looked through a lot of explanations of the way a CNN conventionally handles multiple channels (such as 3 in an RGB image) and am still at a loss.
When a 5x5x3 filter (say) is applied to a patch of an RGB image what exactly happens? Is it in fact 3 different 2D convolutions (with independent weights) that happen separately to each channel? And then the results get simply added together to produce the final output to pass to the next layer? Or is it a truly 3D convolution?
Just like any other layer, a convolutional layer receives input, transforms the input in some way, and then outputs the transformed input to the next layer. The inputs to convolutional layers are called input channels, and the outputs are called output channels.
The first step of 2D convolution for multi-channels: each of the kernels in the filter are applied to three channels in the input layer, separately. The image is adopted from this link. Then these three channels are summed together (element-wise addition) to form one single channel (3 x 3 x 1).
The convolutional Neural Network CNN works by getting an image, designating it some weightage based on the different objects of the image, and then distinguishing them from each other. CNN requires very little pre-process data as compared to other deep learning algorithms.
A convolution is how the input is modified by a filter. In convolutional networks, multiple filters are taken to slice through the image and map them one by one and learn different portions of an input image.
This image is from Andrew Ng's deeplearning.ai course. 6 X 6 X 3 - where 3 corresponds to 3 color channels. 6 X 6 being the height and widht of the image. For the convolution step we convolve the input image with 3 X 3 X 3 filter/kernel. The input image and filter both will have 3 layers. (Mostly both are same for input image and filter).The output will be 4 X 4 X 1. 3 X 3 X 3 gives you 27 features/parameters which you multiply with the corresponding Red, Green and blue channels. Finally add up all those numbers to get the value for [0,0] in 4 X 4 output image. Now move the yellow cube of the input image and slide it over 1 box to your right and once it reaches the right end, you slide the cube one row down and continue your multiplication to fill the 4 X 4 output. Would suggest you to take a paper and pencil, fill random values in all the cubes for input as well as the kernel and solve the multiplication.
For more details watch these lectures on youtube. https://www.youtube.com/watch?v=KTB_OFoAQcc&index=6&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF
https://www.youtube.com/watch?v=7g8jpK4llkc&t=1s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With