Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Gradient Descent and Newton's Gradient Descent?

I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plan gradient descent and the newton's method?

From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?

like image 932
London guy Avatar asked Aug 22 '12 05:08

London guy


People also ask

What is the difference between gradient descent and Newton's method?

Gradient descent algorithms find local minima by moving along the direction of steepest descent while Newton's method takes into account curvature information and thereby often improves convergence.

Is Newton's method better than gradient descent?

After reviewing a set of lectures on convex optimization, Newton's method seems to be a far superior algorithm than gradient descent to find globally optimal solutions, because Newton's method can provide a guarantee for its solution, it's affine invariant, and most of all it converges in far fewer steps.

Does gradient descent use Newton's method?

Newton's method has stronger constraints in terms of the differentiability of the function than gradient descent. If the second derivative of the function is undefined in the function's root, then we can apply gradient descent on it but not Newton's method.

What is the difference between gradient descent and steepest descent?

Summary. The gradient is the directional derivative of a function. The directional of steepest descent (or ascent) is the direction amongst all nearby directions that lowers or raises the value of f the most.


2 Answers

At a local minimum (or maximum) x, the derivative of the target function f vanishes: f'(x) = 0 (assuming sufficient smoothness of f).

Gradient descent tries to find such a minimum x by using information from the first derivative of f: It simply follows the steepest descent from the current point. This is like rolling a ball down the graph of f until it comes to rest (while neglecting inertia).

Newton's method tries to find a point x satisfying f'(x) = 0 by approximating f' with a linear function g and then solving for the root of that function explicitely (this is called Newton's root-finding method). The root of g is not necessarily the root of f', but it is under many circumstances a good guess (the Wikipedia article on Newton's method for root finding has more information on convergence criteria). While approximating f', Newton's method makes use of f'' (the curvature of f). This means it has higher requirements on the smoothness of f, but it also means that (by using more information) it often converges faster.

like image 88
Florian Brucker Avatar answered Sep 22 '22 15:09

Florian Brucker


Put simply, gradient descent you just take a small step towards where you think the zero is and then recalculate; Newton's method, you go all the way there.

like image 29
dashnick Avatar answered Sep 18 '22 15:09

dashnick