What is the difference between SGD and back-propagation?

1 Answers

Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute all required partial derivatives in linear time in terms of the graph size (while naive gradient computations would scale exponentially with depth).

SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates. You could also apply SGD to gradients obtained in a different way (from sampling, numerical approximators etc.). Symmetrically you can use other optimization techniques with backprop as well, everything that can use gradient/jacobian.

This common misconception comes from the fact, that for simplicity people sometimes say "trained with backprop", what actually means (if they do not specify optimizer) "trained with SGD using backprop as a gradient computing technique". Also, in old textbooks you can find things like "delta rule" and other a bit confusing terms, which describe exactly the same thing (as neural network community was for a long time a bit independent from general optimization community).

Thus you have two layers of abstraction:

gradient computation - where backprop comes to play
optimization level - where techniques like SGD, Adam, Rprop, BFGS etc. come into play, which (if they are first order or higher) use gradient computed above

answered Sep 20 '22 11:09

lejlot

Related questions
                            
                                Evaluate multiple scores on sklearn cross_val_score
                            
                                How to tell which Keras model is better?
                            
                                What is the use of train_on_batch() in keras?
                            
                                What is the correct way to change image channel ordering between channels first and channels last?
                            
                                PCA For categorical features?
                            
                                Machine Learning and Natural Language Processing [closed]
                            
                                What is the difference between Keras model.evaluate() and model.predict()?
                            
                                Different decision tree algorithms with comparison of complexity or performance
                            
                                Received a label value of 1 which is outside the valid range of [0, 1) - Python, Keras
                            
                                How to calculate the number of parameters of convolutional neural networks?
                            
                                Can I use CountVectorizer in scikit-learn to count frequency of documents that were not used to extract the tokens?
                            
                                Labels for clustermap in seaborn?
                            
                                How to calculate the regularization parameter in linear regression
                            
                                Make a custom loss function in keras
                            
                                How many principal components to take?
                            
                                CNN - Image Resizing VS Padding (keeping aspect ratio or not?)
                            
                                How do I find which attributes my tree splits on, when using scikit-learn?
                            
                                Evaluating pytorch models: `with torch.no_grad` vs `model.eval()`
                            
                                Calling "fit" multiple times in Keras
                            
                                GridSearch for an estimator inside a OneVsRestClassifier

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between SGD and back-propagation?

Tags:

artificial-intelligence

machine-learning

backpropagation

difference

gradient-descent

Влад Концевич

People also ask

1 Answers

lejlot

Recent Activity

Donate For Us