I was taking a look at Convolutional Neural Network from CS231n Convolutional Neural Networks for Visual Recognition. In Convolutional Neural Network, the neurons are arranged in 3 dimensions(<code>height</code>, <code>width</code>, <code>depth</code>). I am having trouble with the <code>depth</code> of the CNN. I can't visualize what it is. In the link they said <code>The CONV layer's parameters consist of a set of learnable filters. Every filter is small spatially (along width and height), but extends through the full depth of the input volume</code>. For example loook at this picture. Sorry if the image is too crappy. <img src="https://i.stack.imgur.com/qmf0m.jpg" alt="crappy picture"> I can grasp the idea that we take a small area off the image, then compare it with the "Filters". So the filters will be collection of small images? Also they said <code>We will connect each neuron to only a local region of the input volume. The spatial extent of this connectivity is a hyperparameter called the receptive field of the neuron.</code> So is the receptive field has the same dimension as the filters? Also what will be the depth here? And what do we signify using the depth of a CNN? So, my question mainly is, if i take an image having dimension of <code>[32*32*3]</code> (Lets say i have 50000 of these images, making the dataset <code>[50000*32*32*3]</code>), what shall i choose as its depth and what would it mean by the depth. Also what will be the dimension of the filters? Also it will be much helpful if anyone can provide some link that gives some intuition on this. EDIT: So in one part of the tutorial(Real-world example part), it says <code>The Krizhevsky et al. architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional Layer, it used neurons with receptive field size F=11, stride S=4 and no zero padding P=0. Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of K=96, the Conv layer output volume had size [55x55x96].</code> Here we see the depth is 96. So is depth something that i choose arbitrarily? or something i compute? Also in the example above(Krizhevsky et al) they had 96 depths. So what does it mean by its 96 depths? Also the tutorial stated <code>Every filter is small spatially (along width and height), but extends through the full depth of the input volume</code>. So that means the depth will be like this? If so then can i assume <code>Depth = Number of Filters</code>? <img src="https://i.stack.imgur.com/txz5T.jpg" alt="enter image description here">

In Deep Neural Networks the depth refers to how deep the network is but in this context, the depth is used for visual recognition and it translates to the 3rd dimension of an image. In this case you have an image, and the size of this input is 32x32x3 which is <code>(width, height, depth)</code>. The neural network should be able to learn based on this parameters as depth translates to the different channels of the training images. UPDATE: In each layer of your CNN it learns regularities about training images. In the very first layers, the regularities are curves and edges, then when you go deeper along the layers you start learning higher levels of regularities such as colors, shapes, objects etc. This is the basic idea, but there lots of technical details. Before going any further give this a shot : http://www.datarobot.com/blog/a-primer-on-deep-learning/ UPDATE 2: Have a look at the first figure in the link you provided. It says 'In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).' It means that a ConvNet neuron transforms the input image by arranging its neurons in three dimeonsion. As an answer to your question, depth corresponds to the different color channels of an image. Moreover, about the filter depth. The tutorial states this. Every filter is small spatially (along width and height), but extends through the full depth of the input volume. Which basically means that a filter is a smaller part of an image that moves around the depth of the image in order to learn the regularities in the image. UPDATE 3: For the real world example I just browsed the original paper and it says this : The first convolutional layer filters the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels. In the tutorial it refers the depth as the channel, but in real world you can design whatever dimension you like. After all that is your design The tutorial aims to give you a glimpse of how ConvNets work in theory, but if I design a ConvNet nobody can stop me proposing one with a different depth. Does this make any sense?

What is Depth of a convolutional neural network?

Tags:

machine-learning

neural-network

deep-learning

conv-neural-network

I was taking a look at Convolutional Neural Network from CS231n Convolutional Neural Networks for Visual Recognition. In Convolutional Neural Network, the neurons are arranged in 3 dimensions(height, width, depth). I am having trouble with the depth of the CNN. I can't visualize what it is.

In the link they said The CONV layer's parameters consist of a set of learnable filters. Every filter is small spatially (along width and height), but extends through the full depth of the input volume.

For example loook at this picture. Sorry if the image is too crappy. crappy picture

I can grasp the idea that we take a small area off the image, then compare it with the "Filters". So the filters will be collection of small images? Also they said We will connect each neuron to only a local region of the input volume. The spatial extent of this connectivity is a hyperparameter called the receptive field of the neuron. So is the receptive field has the same dimension as the filters? Also what will be the depth here? And what do we signify using the depth of a CNN?

So, my question mainly is, if i take an image having dimension of [32*32*3] (Lets say i have 50000 of these images, making the dataset [50000*32*32*3]), what shall i choose as its depth and what would it mean by the depth. Also what will be the dimension of the filters?

Also it will be much helpful if anyone can provide some link that gives some intuition on this.

EDIT: So in one part of the tutorial(Real-world example part), it says The Krizhevsky et al. architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional Layer, it used neurons with receptive field size F=11, stride S=4 and no zero padding P=0. Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of K=96, the Conv layer output volume had size [55x55x96].

Here we see the depth is 96. So is depth something that i choose arbitrarily? or something i compute? Also in the example above(Krizhevsky et al) they had 96 depths. So what does it mean by its 96 depths? Also the tutorial stated Every filter is small spatially (along width and height), but extends through the full depth of the input volume.

So that means the depth will be like this? If so then can i assume Depth = Number of Filters? enter image description here

489

asked Aug 30 '15 07:08

Shubhashis

2 Answers

In Deep Neural Networks the depth refers to how deep the network is but in this context, the depth is used for visual recognition and it translates to the 3rd dimension of an image.

In this case you have an image, and the size of this input is 32x32x3 which is (width, height, depth). The neural network should be able to learn based on this parameters as depth translates to the different channels of the training images.

UPDATE:

In each layer of your CNN it learns regularities about training images. In the very first layers, the regularities are curves and edges, then when you go deeper along the layers you start learning higher levels of regularities such as colors, shapes, objects etc. This is the basic idea, but there lots of technical details. Before going any further give this a shot : http://www.datarobot.com/blog/a-primer-on-deep-learning/

UPDATE 2:

Have a look at the first figure in the link you provided. It says 'In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).' It means that a ConvNet neuron transforms the input image by arranging its neurons in three dimeonsion.

As an answer to your question, depth corresponds to the different color channels of an image.

Moreover, about the filter depth. The tutorial states this.

Every filter is small spatially (along width and height), but extends through the full depth of the input volume.

Which basically means that a filter is a smaller part of an image that moves around the depth of the image in order to learn the regularities in the image.

UPDATE 3:

For the real world example I just browsed the original paper and it says this : The first convolutional layer filters the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels.

In the tutorial it refers the depth as the channel, but in real world you can design whatever dimension you like. After all that is your design

The tutorial aims to give you a glimpse of how ConvNets work in theory, but if I design a ConvNet nobody can stop me proposing one with a different depth.

Does this make any sense?

112

answered Oct 04 '22 16:10

Semih Yagcioglu

Depth of CONV layer is number of filters it is using. Depth of a filter is equal to depth of image it is using as input.

For Example: Let's say you are using an image of 227*227*3. Now suppose you are using a filter of size of 11*11(spatial size). This 11*11 square will be slided along whole image to produce a single 2 dimensional array as a response. But in order to do so, it must cover every aspect inside of 11*11 area. Therefore depth of filter will be depth of image = 3. Now suppose we have 96 such filter each producing different response. This will be depth of Convolutional layer. It is simply number of filters used.

answered Oct 04 '22 16:10

Adarsh Maurya

Related questions
                            
                                Difference between parameters, features and class in Machine Learning
                            
                                Tensorflow Keras Copy Weights From One Model to Another
                            
                                Why the cost function of logistic regression has a logarithmic expression?
                            
                                How can I do Train And Test step in Giza++?
                            
                                TimeDistributed(Dense) vs Dense in Keras - Same number of parameters
                            
                                Does TensorFlow have cross validation implemented for its users?
                            
                                General approach to developing an image classification algorithm for Dilbert cartoons
                            
                                Insert or delete a step in scikit-learn Pipeline
                            
                                How to set weights in Keras with a numpy array?
                            
                                "RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 3 3, but got 3-dimensional input of size [3, 224, 224] instead"?
                            
                                How to fix MatMul Op has type float64 that does not match type float32 TypeError?
                            
                                record the computation time for each epoch in Keras during model.fit()
                            
                                How to load only specific weights on Keras
                            
                                How to turn off dropout for testing in Tensorflow?
                            
                                Tensorflow Slim: TypeError: Expected int32, got list containing Tensors of type '_Message' instead
                            
                                Get learning rate of keras model
                            
                                Simple Python implementation of collaborative topic modeling?
                            
                                Tackling Class Imbalance: scaling contribution to loss and sgd
                            
                                confused about random_state in decision tree of scikit learn
                            
                                Python Implementation of OPTICS (Clustering) Algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With