Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Backpropagation in Convolutional Neural Networks

Consider a Convolutional Neural Network with the following architecture:

CNN Architecture

Here C_i refers to the i^th convolutional layer and P_i refers to the i^th mean pooling layer. Corresponding to each layer will be an output. Let delta^P_j refer to the error in the output of layer P_j (and same for enter image description here).

delta^P_2 can be calculated easily using normal backpropagation equations since it is fully connected to the softmax layer. delta^C_2 can be calculated simply by upsampling delta^P_2 appropriately (and multiplying by gradient of output of C_2) since we are using mean pooling.

How do we propagate error from the output of C_2 to the output of P_1? In other words, how do we find delta^P_1 from delta^C_2?

Standford's Deep Learning tutorial uses the following equation to do this:

UFLDL Equation

However I am facing the following problems in using this equation:

  1. My W_k^l has size (2x2) and delta_k^l has size (6x6), (I am using valid convolution, output of P_1 has size (13x13) and output of P_2 has size (6x6)). This inner matrix multiplication does not even makes sense in my case.

  2. Equation assumes that the number of channels in both layers is same. Again this is not true for me. Output of P_1 has 64 channels while output of C_2 has 96 channels.

What am I doing wrong here? Can anybody please explain how to propagate errors through a convolutional layer?

Simple MATLAB example will be highly appreciated.

like image 434
Shubham Gupta Avatar asked Oct 30 '22 05:10

Shubham Gupta


1 Answers

A good point to note here is that pooling layers do not do any learning themselves. The function of the pooling layer is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network.

During forward propagation, a P by P pooling block is reduced to a single value i.e. value of the “winning unit”. To keep track of the “winning unit” its index noted during the forward pass and used for gradient routing during backpropagation.

During backpropagation, the gradients in the convolutional layers are calculated and the backward pass to the pooling layer then involves assigning the “winning unit” the gradient value from the convolutional layer as the index was noted prior during the forward pass.

Gradient routing is done in the following ways:

  • Max-pooling - the error is just assigned to where it comes from - the “winning unit” because other units in the previous layer’s pooling blocks did not contribute to it hence all the other assigned values of zero

  • Average pooling - the error is multiplied by 1 / (P by P) and assigned to the whole pooling block (all units get this same value).

Read a more comprehensive breakdown on the whole backpropagation procedure here

like image 173
Jefkine Kafunah Avatar answered Nov 11 '22 16:11

Jefkine Kafunah