Backpropagation algorithm through cross-channel local response normalization (LRN) layer

Tags:

I am working on replicating a neural network. I'm trying to get an understanding of how the standard layer types work. In particular, I'm having trouble finding a description anywhere of how cross-channel normalisation layers behave on the backward-pass.

Since the normalization layer has no parameters, I could guess two possible options:

The error gradients from the next (i.e. later) layer are passed backwards without doing anything to them.
The error gradients are normalized in the same way the activations are normalized across channels in the forward pass.

I can't think of a reason why you'd do one over the other based on any intuition, hence why I'd like some help on this.

EDIT1:

The layer is a standard layer in caffe, as described here http://caffe.berkeleyvision.org/tutorial/layers.html (see 'Local Response Normalization (LRN)').

The layer's implementation in the forward pass is described in section 3.3 of the alexNet paper: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

EDIT2:

I believe the forward and backward pass algorithms are described in both the Torch library here: https://github.com/soumith/cudnn.torch/blob/master/SpatialCrossMapLRN.lua

and in the Caffe library here: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp

Please could anyone who is familiar with either/both of these translate the method for the backward pass stage into plain english?

220

asked Nov 18 '15 14:11

user1488804

1 Answers

It uses the chain rule to propagate the gradient backwards through the local response normalization layer. It is somewhat similar to a nonlinearity layer in this sense (which also doesn't have trainable parameters on its own, but does affect gradients going backwards).

From the code in Caffe that you linked to I see that they take the error in each neuron as a parameter, and compute the error for the previous layer by doing following:

First, on the forward pass they cache a so-called scale, that is computed (in terms of AlexNet paper, see the formula from section 3.3) as:

scale_i = k + alpha / n * sum(a_j ^ 2)

Here and below sum is sum indexed by j and goes from max(0, i - n/2) to min(N, i + n/2)

(note that in the paper they do not normalize by n, so I assume this is something that Caffe does differently than AlexNet). Forward pass is then computed as b_i = a_i + scale_i ^ -beta.

To backward propagate the error, let's say that the error coming from the next layer is be_i, and the error that we need to compute is ae_i. Then ae_i is computed as:

ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * sum(be_j * b_j / scale_j)

Since you are planning to implement it manually, I will also share two tricks that Caffe uses in their code that makes the implementation simpler:

When you compute the addends for the sum, allocate an array of size N + n - 1, and pad it with n/2 zeros on each end. This way you can compute the sum from i - n/2 to i + n/2, without caring about going below zero and beyond N.
You don't need to recompute the sum on each iteration, instead compute the the addends in advance (a_j^2 for the front pass, be_j * b_j / scale_j for the backward pass), then compute the sum for i = 0, and then for each consecutive i just add addend[i + n/2] and subtract addend[i - n/2 - 1], it will give you the value of the sum for the new value of i in constant time.

answered Sep 20 '22 14:09

Ishamael

Related questions
                            
                                C5.0 decision tree - c50 code called exit with value 1
                            
                                Multi-class classification in libsvm [closed]
                            
                                Library in python for neural networks to plot ROC, AUC, DET [closed]
                            
                                .arff files with scikit-learn?
                            
                                support vector machines in matlab
                            
                                what is the bootstrapped data in data mining?
                            
                                String Matching Using Recurrent Neural Networks
                            
                                Font Recognition From free Hand drawing
                            
                                How to get inertia value for each k-means cluster using scikit-learn?
                            
                                How to properly set steps_per_epoch and validation_steps in Keras?
                            
                                How to interprete the regression plot obtained at the end of neural network regression for multiple outputs?
                            
                                C++ Decision Tree Implementation Question: Think In Code
                            
                                Web/browser-oriented open source machine learning projects?
                            
                                Google Colaboratory local runtime using local GPU
                            
                                Compare ways to tune hyperparameters in scikit-learn
                            
                                Tensorflow Lite GPU support for python
                            
                                Open Source Question Answering Frameworks [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Backpropagation algorithm through cross-channel local response normalization (LRN) layer

Tags:

machine-learning

neural-network

backpropagation

deep-learning

conv-neural-network

user1488804

People also ask

1 Answers

Ishamael

Recent Activity

Donate For Us