Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dimensions in convolutional neural network

I am trying to understand how the dimensions in convolutional neural network behave. In the figure below the input is 28-by-28 matrix with 1 channel. Then there are 32 5-by-5 filters (with stride 2 in height and width). So I understand that the result is 14-by-14-by-32. But then in the next convolutional layer we have 64 5-by-5 filters (again with stride 2). So why the result is 7-by-7- by 64 and not 7-by-7-by 32*64? Aren't we applying each one of the 64 filters to each one of the 32 channels?

enter image description here

like image 337
Miriam Farber Avatar asked Mar 10 '17 07:03

Miriam Farber


Video Answer


2 Answers

One filter is the sum of all the dimensions in the previous layer. This means that the 5x5 filter sums up over all 32 dimensions and in essence is a weighted sum of 32*5*5 values. However the weight values are shared across dimensions. Then there are 64 such filters. A better explanation with images can be found here: http://cs231n.github.io/convolutional-networks/.

like image 171
Thomas Pinetz Avatar answered Oct 05 '22 19:10

Thomas Pinetz


The depth is usually given implicitly. For example many Images are considered to have depth 3 (for the three color dimensions in each pixel). Then by a 5x5 filter we mean a 5x5x3 Filter. In your case the 5x5-Filter is really a 5x5x32 filter.

Depth one is usually explicitly stated (as in "5x5x1 filter").

like image 22
Lennart Scharmann Avatar answered Oct 05 '22 18:10

Lennart Scharmann