Convolutional Neural Networks - Multiple Channels

Tags:

How is the convolution operation carried out when multiple channels are present at the input layer? (e.g. RGB)

After doing some reading on the architecture/implementation of a CNN I understand that each neuron in a feature map references NxM pixels of an image as defined by the kernel size. Each pixel is then factored by the feature maps learned NxM weight set (the kernel/filter), summed, and input into an activation function. For a simple grey scale image, I imagine the operation would be something adhere to the following pseudo code:

Click to copy

for i in range(0, image_width-kernel_width+1):     for j in range(0, image_height-kernel_height+1):         for x in range(0, kernel_width):             for y in range(0, kernel_height):                 sum += kernel[x,y] * image[i+x,j+y]          feature_map[i,j] = act_func(sum)         sum = 0.0

However I don't understand how to extend this model to handle multiple channels. Are three separate weight sets required per feature map, shared between each colour?

Referencing this tutorial's 'Shared Weights' section: http://deeplearning.net/tutorial/lenet.html Each neuron in a feature map references layer m-1 with colours being referenced from separate neurons. I don't understand the relationship they are expressing here. Are the neurons kernels or pixels and why do they reference separate parts of the image?

Based on my example, it would seem that a single neurons kernel is exclusive to a particular region in an image. Why have they split the RGB component over several regions?

470

asked Jan 01 '15 05:01

Junkers

2 Answers

How is the convolution operation carried out when multiple channels are present at the input layer? (e.g. RGB)

In such a case you have one 2D kernel per input channel (a.k.a plane).

So you perform each convolution (2D Input, 2D kernel) separately and you sum the contributions which gives the final output feature map.

Please refer to the slide 64 of this CVPR 2014 tutorial by Marc'Aurelio Ranzato:

enter image description here

Are three separate weight sets required per feature map, shared between each colour?

If you consider a given output feature map, you have 3 x 2D kernels (i.e one kernel per input channel). Each 2D kernel shares the same weights along the whole input channel (R, G, or B here).

So the whole convolutional layer is a 4D-tensor (nb. input planes x nb. output planes x kernel width x kernel height).

Why have they split the RGB component over several regions?

As detailed above think of each R, G and B channel as a separate input plane with its dedicated 2D kernel.

103

answered Oct 13 '22 14:10

deltheil

For example, if your input image is of size W x H x C where W, H, and C represent the length of width, height, and the size of channels. The dimension of the filter (aka kernel) would be K x K x C where K denotes the length of the dimension of the kernel. Using max to aggregate the results of different channels fails to distinguish the nuances across channels, which is not what we want. As illustrated in the figure below (source), the input data is of size 6 x 6 x 3. The number of units (filters) is 2, each of which has the dimensions 3 x 3 x 3. The output is 4 x 4 x 2. So in general channels need to be treated separately under each filter.

enter image description here

answered Oct 13 '22 15:10

Li-Pin Juan

Related questions
                            
                                FindChessboardCorners cannot detect chessboard on very large images by long focal length lens
                            
                                Scanning images for finding rectangles
                            
                                Measuring the average thickness of traces in an image
                            
                                OpenCV Orb not finding matches once rotation/scale invariances are introduced
                            
                                OpenCV 2.4.1 - computing SURF descriptors in Python
                            
                                Pixel neighbors in 2d array (image) using Python
                            
                                Data Augmentation Image Data Generator Keras Semantic Segmentation
                            
                                Sum of elements in a matrix in OpenCV?
                            
                                Pytorch RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0
                            
                                What is the difference between sparse and dense optical flow?
                            
                                RuntimeError: Attempting to deserialize object on a CUDA device
                            
                                Is HSL Superior over HSI and HSV Color Spaces?
                            
                                How to find the rotation matrix between two coordinate systems?
                            
                                Automated My Little Pony detection and classification
                            
                                How can you tell if a homography matrix is acceptable or not?
                            
                                Efficient way to combine intersecting bounding rectangles
                            
                                How to get the real life size of an object from an image, when not knowing the distance between object and the camera?
                            
                                Where should a beginner start with computer vision? [closed]
                            
                                How to check if obtained homography matrix is good?
                            
                                What is the purpose of the ROI layer in a Fast R-CNN?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convolutional Neural Networks - Multiple Channels

Tags:

artificial-intelligence

neural-network

computer-vision

convolution

Junkers

People also ask

2 Answers

deltheil

Li-Pin Juan

Recent Activity

Donate For Us