What is the difference between backpropagation and reverse-mode autodiff?

Tags:

Going through this book, I am familiar with the following:

For each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error.

However I am not sure how this differs from the reverse-mode autodiff implementation by TensorFlow.

As far as I know reverse-mode autodiff first goes through the graph in the forward direction and then in the second pass computes all partial derivatives for the outputs with respect to the inputs. This is very similar to the propagation algorithm.

How does backpropagation differ from reverse-mode autodiff ?

419

asked Apr 19 '18 16:04

rrz0

2 Answers

Thanks to the answer by David Parks for the valid contribution and useful links, however I have found the answer to this question by the author of the book himself, which may provide a more concise answer:

Bakpropagation refers to the whole process of training an artificial neural network using multiple backpropagation steps, each of which computes gradients and uses them to perform a Gradient Descent step. In contrast, reverse-mode auto diff is simply a technique used to compute gradients efficiently and it happens to be used by backpropagation.

145

answered Sep 28 '22 14:09

rrz0

The most important distinction between backpropagation and reverse-mode AD is that reverse-mode AD computes the vector-Jacobian product of a vector valued function from R^n -> R^m, while backpropagation computes the gradient of a scalar valued function from R^n -> R. Backpropagation is therefore a subset of reverse-mode AD.

When we train neural networks, we always have a scalar-valued loss function, so we are always using backpropagation. Since backprop is a subset of reverse-mode AD, then we are also using reverse-mode AD when we train a neural network.

Whether or not backpropagation takes the more general definition of reverse-mode AD as applied to a scalar loss function, or the more specific definition of reverse-mode AD as applied to a scalar loss function for training neural networks is a matter of personal taste. It's a word that has slightly different meaning in different contexts, but is most commonly used in the machine learning community to talk about computing gradients of neural network parameters using a scalar loss function.

For completeness: Sometimes reverse-mode AD can compute the full Jacobian on a single reverse pass, not just the vector-Jacobian product. Also, the vector Jacobian product for a scalar function where the vector is the vector [1.0] is the same as the gradient.

answered Sep 28 '22 12:09

Nick McGreivy

Related questions
                            
                                TensorFlow Master and Worker Service
                            
                                Using pre-trained inception_resnet_v2 with Tensorflow
                            
                                Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100)
                            
                                How do I resolve these tensorflow warnings?
                            
                                module 'tensorflow' has no attribute 'GPUOptions'
                            
                                AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'append'
                            
                                tensorflow (not tensorflow-gpu): failed call to cuInit: UNKNOWN ERROR (303)
                            
                                Is tensorflow lazy?
                            
                                Is it possible to visualize keras embeddings in tensorboard?
                            
                                RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
                            
                                Is there a way to force Bazel to run tests serially
                            
                                TensorFlow concat a variable-sized placeholder with a vector
                            
                                How do I choose an optimizer for my tensorflow model?
                            
                                Can i finetune deeplab to a custom dataset in tensorflow?
                            
                                Keras LSTM: a time-series multi-step multi-features forecasting - poor results
                            
                                How to serve a tensorflow-module, specifically Universal Sentence Encoder?
                            
                                Tensorflow 2.0 dataset and dataloader
                            
                                Tensorflow: `batch_size` or `steps` is required for `Tensor` or `NumPy` input data
                            
                                Is it thread-safe when using tf.Session in inference service?
                            
                                How does tf.multinomial work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between backpropagation and reverse-mode autodiff?

Tags:

machine-learning

backpropagation

tensorflow

deep-learning

rrz0

People also ask

2 Answers

rrz0

Nick McGreivy

Recent Activity

Donate For Us