Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how is backpropagation the same (or not) as reverse automatic differentiation?

The Wikipedia page for backpropagation has this claim:

The backpropagation algorithm for calculating a gradient has been rediscovered a number of times, and is a special case of a more general technique called automatic differentiation in the reverse accumulation mode.

Can someone expound on this, put it in layman's terms? What is the function being differentiated? What is the "special case"? Is it the adjoint values themselves that are used or the final gradient?

Update: since writing this I have found that this is covered in the Deep Learning book, section 6.5.9. See https://www.deeplearningbook.org/ . I have also found this paper to be informative on the subject: "Stable architectures for deep neural networks" by Haber and Ruthotto.

like image 919
Brannon Avatar asked May 06 '14 03:05

Brannon


People also ask

Is backpropagation automatic differentiation?

Backpropagation is a special case of an extraordinarily powerful programming abstraction called automatic differentiation (AD).

What is the difference between backpropagation and reverse mode Autodiff?

I think the difference is that back-propagation refers to the updating of weights with respect to their gradient to minimize a function; "back-propagating the gradients" is a typical term used. Conversely, reverse-mode diff merely means calculating the gradient of a function.

Is backpropagation same as gradient descent?

Stochastic gradient descent is an optimization algorithm for minimizing the loss of a predictive model with regard to a training dataset. Back-propagation is an automatic differentiation algorithm for calculating gradients for the weights in a neural network graph structure.

What is the difference between backpropagation and Backpropagation through time?

The Backpropagation algorithm is suitable for the feed forward neural network on fixed sized input-output pairs. The Backpropagation Through Time is the application of Backpropagation training algorithm which is applied to the sequence data like the time series. It is applied to the recurrent neural network.


1 Answers

"What is the function being differentiated? What is the "special case?""

The most important distinction between backpropagation and reverse-mode AD is that reverse-mode AD computes the vector-Jacobian product of a vector valued function from R^n -> R^m, while backpropagation computes the gradient of a scalar valued function from R^n -> R. Backpropagation is therefore a special case of reverse-mode AD for scalar functions.

When we train neural networks, we always have a scalar-valued loss function, so we are always using backpropagation. This is the function being differentiated. Since backprop is a subset of reverse-mode AD, then we are also using reverse-mode AD when we train a neural network.

"Is it the adjoint values themselves that are used or the final gradient?"

The adjoint of a variable is the gradient of the loss function with respect to that variable. When we do neural network training, we use the gradients of the parameters (like weights, biases, etc) with respect to the loss to update the parameters. So we do use the adjoints, but only the adjoints of the parameters (which are equivalent to the gradient of the parameters).

like image 100
Nick McGreivy Avatar answered Sep 18 '22 05:09

Nick McGreivy