Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between SGD and back-propagation?

Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?

like image 436
Влад Концевич Avatar asked Jun 21 '16 20:06

Влад Концевич


People also ask

Does SGD use backpropagation?

SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates.

What is the difference between propagation and backpropagation?

Forward Propagation is the way to move from the Input layer (left) to the Output layer (right) in the neural network. The process of moving from the right to left i.e backward from the Output to the Input layer is called the Backward Propagation.

What is the difference between gradient descent and stochastic gradient descent?

The only difference comes while iterating. In Gradient Descent, we consider all the points in calculating loss and derivative, while in Stochastic gradient descent, we use single point in loss function and its derivative randomly.

Is backpropagation an optimization algorithm?

A lot of time, you hear backpropagation being called an optimization technique: it is an algorithm that use gradient descent (detailed below) to minimize error in the predictions of Machine Learning models.


1 Answers

Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute all required partial derivatives in linear time in terms of the graph size (while naive gradient computations would scale exponentially with depth).

SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates. You could also apply SGD to gradients obtained in a different way (from sampling, numerical approximators etc.). Symmetrically you can use other optimization techniques with backprop as well, everything that can use gradient/jacobian.

This common misconception comes from the fact, that for simplicity people sometimes say "trained with backprop", what actually means (if they do not specify optimizer) "trained with SGD using backprop as a gradient computing technique". Also, in old textbooks you can find things like "delta rule" and other a bit confusing terms, which describe exactly the same thing (as neural network community was for a long time a bit independent from general optimization community).

Thus you have two layers of abstraction:

  • gradient computation - where backprop comes to play
  • optimization level - where techniques like SGD, Adam, Rprop, BFGS etc. come into play, which (if they are first order or higher) use gradient computed above
like image 98
lejlot Avatar answered Sep 20 '22 11:09

lejlot