how is backpropagation the same (or not) as reverse automatic differentiation?

Tags:

The Wikipedia page for backpropagation has this claim:

The backpropagation algorithm for calculating a gradient has been rediscovered a number of times, and is a special case of a more general technique called automatic differentiation in the reverse accumulation mode.

Can someone expound on this, put it in layman's terms? What is the function being differentiated? What is the "special case"? Is it the adjoint values themselves that are used or the final gradient?

Update: since writing this I have found that this is covered in the Deep Learning book, section 6.5.9. See https://www.deeplearningbook.org/ . I have also found this paper to be informative on the subject: "Stable architectures for deep neural networks" by Haber and Ruthotto.

919

asked May 06 '14 03:05

Brannon

1 Answers

"What is the function being differentiated? What is the "special case?""

The most important distinction between backpropagation and reverse-mode AD is that reverse-mode AD computes the vector-Jacobian product of a vector valued function from R^n -> R^m, while backpropagation computes the gradient of a scalar valued function from R^n -> R. Backpropagation is therefore a special case of reverse-mode AD for scalar functions.

When we train neural networks, we always have a scalar-valued loss function, so we are always using backpropagation. This is the function being differentiated. Since backprop is a subset of reverse-mode AD, then we are also using reverse-mode AD when we train a neural network.

"Is it the adjoint values themselves that are used or the final gradient?"

The adjoint of a variable is the gradient of the loss function with respect to that variable. When we do neural network training, we use the gradients of the parameters (like weights, biases, etc) with respect to the loss to update the parameters. So we do use the adjoints, but only the adjoints of the parameters (which are equivalent to the gradient of the parameters).

100

answered Sep 18 '22 05:09

Nick McGreivy

Related questions
                            
                                Find three elements in a sorted array which sum to a fourth element
                            
                                Build JPA Specification from tree
                            
                                Find the maximum product of two non overlapping palindromic subsequences
                            
                                Why actual runtime for a larger search value is smaller than a lower search value in a sorted array?
                            
                                Transforming screen-to-map coordinates in a isometric hexagon tiling engine?
                            
                                Unit Testing Approximation Algorithms
                            
                                What algorithms does D3.js use for the force-directed graph?
                            
                                Finding point, which distances sum to all other points on the line is the lowest
                            
                                Finding closest number in a range
                            
                                How does the Raft algorithm guarantee consensus if there are multiple leaders?
                            
                                Ordered list with O(1) random access and removal
                            
                                Implementing Bresenham's circle drawing algorithm
                            
                                Algorithm to decrypt data with drawn strokes
                            
                                Space partitioning algorithm
                            
                                Determining how difficult a word is to type on a QWERTY keyboard
                            
                                Turning an array of integers into an array of nonnegative integers
                            
                                Supervised Motion Detection Library
                            
                                Divide N cake to M people with minimum wastes
                            
                                Find the smallest number that does not occur as a subsequence of the string
                            
                                How do I use a Trie for spell checking

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how is backpropagation the same (or not) as reverse automatic differentiation?

Tags:

algorithm

neural-network

backpropagation

calculus

automatic-differentiation

Brannon

People also ask

1 Answers

Nick McGreivy

Recent Activity

Donate For Us