Consider a Convolutional Neural Network with the following architecture:
Here refers to the convolutional layer and refers to the mean pooling layer. Corresponding to each layer will be an output. Let refer to the error in the output of layer (and same for ).
can be calculated easily using normal backpropagation equations since it is fully connected to the softmax layer. can be calculated simply by upsampling appropriately (and multiplying by gradient of output of ) since we are using mean pooling.
How do we propagate error from the output of to the output of ? In other words, how do we find from ?
Standford's Deep Learning tutorial uses the following equation to do this:
However I am facing the following problems in using this equation:
My has size (2x2) and has size (6x6), (I am using valid convolution, output of has size (13x13) and output of has size (6x6)). This inner matrix multiplication does not even makes sense in my case.
Equation assumes that the number of channels in both layers is same. Again this is not true for me. Output of has 64 channels while output of has 96 channels.
What am I doing wrong here? Can anybody please explain how to propagate errors through a convolutional layer?
Simple MATLAB example will be highly appreciated.
A good point to note here is that pooling layers do not do any learning themselves. The function of the pooling layer is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network.
During forward propagation, a P by P pooling block is reduced to a single value i.e. value of the “winning unit”. To keep track of the “winning unit” its index noted during the forward pass and used for gradient routing during backpropagation.
During backpropagation, the gradients in the convolutional layers are calculated and the backward pass to the pooling layer then involves assigning the “winning unit” the gradient value from the convolutional layer as the index was noted prior during the forward pass.
Gradient routing is done in the following ways:
Max-pooling - the error is just assigned to where it comes from - the “winning unit” because other units in the previous layer’s pooling blocks did not contribute to it hence all the other assigned values of zero
Average pooling - the error is multiplied by 1 / (P by P) and assigned to the whole pooling block (all units get this same value).
Read a more comprehensive breakdown on the whole backpropagation procedure here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With