Going through this book, I am familiar with the following:
For each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error.
However I am not sure how this differs from the reverse-mode autodiff implementation by TensorFlow.
As far as I know reverse-mode autodiff first goes through the graph in the forward direction and then in the second pass computes all partial derivatives for the outputs with respect to the inputs. This is very similar to the propagation algorithm.
How does backpropagation differ from reverse-mode autodiff ?
Forward Propagation is the way to move from the Input layer (left) to the Output layer (right) in the neural network. The process of moving from the right to left i.e backward from the Output to the Input layer is called the Backward Propagation.
The Backpropagation algorithm is suitable for the feed forward neural network on fixed sized input-output pairs. The Backpropagation Through Time is the application of Backpropagation training algorithm which is applied to the sequence data like the time series.
The backpropagation algorithm is a way to compute the gradients needed to fit the parameters of a neural network, in much the same way we have used gradients for other optimization problems. Backpropagation is a special case of an extraordinarily powerful programming abstraction called automatic differentiation (AD).
Back-propagation is the process of calculating the derivatives and gradient descent is the process of descending through the gradient, i.e. adjusting the parameters of the model to go down through the loss function.
Thanks to the answer by David Parks for the valid contribution and useful links, however I have found the answer to this question by the author of the book himself, which may provide a more concise answer:
Bakpropagation refers to the whole process of training an artificial neural network using multiple backpropagation steps, each of which computes gradients and uses them to perform a Gradient Descent step. In contrast, reverse-mode auto diff is simply a technique used to compute gradients efficiently and it happens to be used by backpropagation.
The most important distinction between backpropagation and reverse-mode AD is that reverse-mode AD computes the vector-Jacobian product of a vector valued function from R^n -> R^m, while backpropagation computes the gradient of a scalar valued function from R^n -> R. Backpropagation is therefore a subset of reverse-mode AD.
When we train neural networks, we always have a scalar-valued loss function, so we are always using backpropagation. Since backprop is a subset of reverse-mode AD, then we are also using reverse-mode AD when we train a neural network.
Whether or not backpropagation takes the more general definition of reverse-mode AD as applied to a scalar loss function, or the more specific definition of reverse-mode AD as applied to a scalar loss function for training neural networks is a matter of personal taste. It's a word that has slightly different meaning in different contexts, but is most commonly used in the machine learning community to talk about computing gradients of neural network parameters using a scalar loss function.
For completeness: Sometimes reverse-mode AD can compute the full Jacobian on a single reverse pass, not just the vector-Jacobian product. Also, the vector Jacobian product for a scalar function where the vector is the vector [1.0] is the same as the gradient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With