Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between backpropagation and reverse-mode autodiff?

Going through this book, I am familiar with the following:

For each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error.

However I am not sure how this differs from the reverse-mode autodiff implementation by TensorFlow.

As far as I know reverse-mode autodiff first goes through the graph in the forward direction and then in the second pass computes all partial derivatives for the outputs with respect to the inputs. This is very similar to the propagation algorithm.

How does backpropagation differ from reverse-mode autodiff ?

like image 419
rrz0 Avatar asked Apr 19 '18 16:04

rrz0


People also ask

What is the difference between backpropagation and forward propagation?

Forward Propagation is the way to move from the Input layer (left) to the Output layer (right) in the neural network. The process of moving from the right to left i.e backward from the Output to the Input layer is called the Backward Propagation.

What is the difference between backpropagation and Backpropagation through time?

The Backpropagation algorithm is suitable for the feed forward neural network on fixed sized input-output pairs. The Backpropagation Through Time is the application of Backpropagation training algorithm which is applied to the sequence data like the time series.

Is backpropagation automatic differentiation?

The backpropagation algorithm is a way to compute the gradients needed to fit the parameters of a neural network, in much the same way we have used gradients for other optimization problems. Backpropagation is a special case of an extraordinarily powerful programming abstraction called automatic differentiation (AD).

What is the difference between backpropagation and gradient descent?

Back-propagation is the process of calculating the derivatives and gradient descent is the process of descending through the gradient, i.e. adjusting the parameters of the model to go down through the loss function.


2 Answers

Thanks to the answer by David Parks for the valid contribution and useful links, however I have found the answer to this question by the author of the book himself, which may provide a more concise answer:

Bakpropagation refers to the whole process of training an artificial neural network using multiple backpropagation steps, each of which computes gradients and uses them to perform a Gradient Descent step. In contrast, reverse-mode auto diff is simply a technique used to compute gradients efficiently and it happens to be used by backpropagation.

like image 145
rrz0 Avatar answered Sep 28 '22 14:09

rrz0


The most important distinction between backpropagation and reverse-mode AD is that reverse-mode AD computes the vector-Jacobian product of a vector valued function from R^n -> R^m, while backpropagation computes the gradient of a scalar valued function from R^n -> R. Backpropagation is therefore a subset of reverse-mode AD.

When we train neural networks, we always have a scalar-valued loss function, so we are always using backpropagation. Since backprop is a subset of reverse-mode AD, then we are also using reverse-mode AD when we train a neural network.

Whether or not backpropagation takes the more general definition of reverse-mode AD as applied to a scalar loss function, or the more specific definition of reverse-mode AD as applied to a scalar loss function for training neural networks is a matter of personal taste. It's a word that has slightly different meaning in different contexts, but is most commonly used in the machine learning community to talk about computing gradients of neural network parameters using a scalar loss function.

For completeness: Sometimes reverse-mode AD can compute the full Jacobian on a single reverse pass, not just the vector-Jacobian product. Also, the vector Jacobian product for a scalar function where the vector is the vector [1.0] is the same as the gradient.

like image 29
Nick McGreivy Avatar answered Sep 28 '22 12:09

Nick McGreivy