From what I understood from CS231n Convolutional Neural Networks for Visual Recognition is that the Size of the output volume
represents the number of neurones given the following parameters:
I posted two examples. In example 1 I have no problem at all. But it's in example 2 that I get confused.
In the Real-world example section they start with a [227 x 227 x 3]
input image. The parameters are the following: F = 11, S = 4, P = 0, W = 227
.
We note that the convolution has a depth of K = 96
. (Why?)
The size of the output volume
is (227 - 11)/4 + 1 = 55
. So we will have 55 x 55 x 96 = 290,400
neurones each pointing (excuse me if butchered the term) to an [11 x 11 x 3]
region in the image which is in fact the kernel where we want to compute the dot product.
In the following example taken from the Numpy examples section. we have an input image with the following shape [11 x 11 x 3]
. The parameters used to compute the size of the output Volume are the following: W = 11, P = 0, S = 2 and F = 5
.
We note that the convolution has a depth of K = 4
The formula (11-5)/2+1 = 4
produces only 4 neurones. Each neurone points to a region of size [5 x 5 x 4]
in the image.
It seems that they are moving the Kernel in the x direction only. Shouldn't we have 12 Neurones each having[5 x 5 x 4]
weights.
V[0,0,0] = np.sum(X[:5,:5,:] * W0) + b0
V[1,0,0] = np.sum(X[2:7,:5,:] * W0) + b0
V[2,0,0] = np.sum(X[4:9,:5,:] * W0) + b0
V[3,0,0] = np.sum(X[6:11,:5,:] * W0) + b0
K = 96
in example 1?Example 1
Why that the convolution has a depth of K = 96?
The depth (K) is equals to the number of filters used on the convolutional layer. A Bigger number gives, usually, better results. The problem is: slower training. Complex images would required more filters. I usually starts tests with 32 filters on the first layer and 64 on the second layer.
Example 2
The formula (11-5)/2+1 = 4 produces only 4 neurones.
I'm no expert, but I think this is false. The formula only define the output size (height and width). A convolutional layer has the size (height and width) and the depth. The size is defined by this formula, the depth by the number of filters used. The total number of neurons is:
## height * width * depth
4 * 4 * 4 = 64
Questions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With