What is the difference between Gradient Descent and Newton's Gradient Descent?

Tags:

I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plan gradient descent and the newton's method?

From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?

932

asked Aug 22 '12 05:08

London guy

2 Answers

At a local minimum (or maximum) x, the derivative of the target function f vanishes: f'(x) = 0 (assuming sufficient smoothness of f).

Gradient descent tries to find such a minimum x by using information from the first derivative of f: It simply follows the steepest descent from the current point. This is like rolling a ball down the graph of f until it comes to rest (while neglecting inertia).

Newton's method tries to find a point x satisfying f'(x) = 0 by approximating f' with a linear function g and then solving for the root of that function explicitely (this is called Newton's root-finding method). The root of g is not necessarily the root of f', but it is under many circumstances a good guess (the Wikipedia article on Newton's method for root finding has more information on convergence criteria). While approximating f', Newton's method makes use of f'' (the curvature of f). This means it has higher requirements on the smoothness of f, but it also means that (by using more information) it often converges faster.

answered Sep 22 '22 15:09

Florian Brucker

Put simply, gradient descent you just take a small step towards where you think the zero is and then recalculate; Newton's method, you go all the way there.

answered Sep 18 '22 15:09

dashnick

Related questions
                            
                                How to do gradient clipping in pytorch?
                            
                                How to define max_queue_size, workers and use_multiprocessing in keras fit_generator()?
                            
                                How to find the importance of the features for a logistic regression model?
                            
                                Unbalanced data and weighted cross entropy
                            
                                Keras - Difference between categorical_accuracy and sparse_categorical_accuracy
                            
                                How to approach a number guessing game (with a twist) algorithm?
                            
                                ImportError('Could not import PIL.Image. ' working with keras-ternsorflow
                            
                                Open Source Neural Network Library [closed]
                            
                                What is the difference between loss function and metric in Keras? [duplicate]
                            
                                Evaluation & Calculate Top-N Accuracy: Top 1 and Top 5
                            
                                gradient descent using python and numpy
                            
                                How to save & load xgboost model? [closed]
                            
                                Training a Neural Network with Reinforcement learning
                            
                                How to interpret Poolallocator messages in tensorflow?
                            
                                classifiers in scikit-learn that handle nan/null
                            
                                Perceptron learning algorithm not converging to 0
                            
                                Keras model.summary() result - Understanding the # of Parameters
                            
                                Keras model.summary() object to string
                            
                                Higher validation accuracy, than training accurracy using Tensorflow and Keras
                            
                                TensorFlow - regularization with L2 loss, how to apply to all weights, not just last one?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between Gradient Descent and Newton's Gradient Descent?

Tags:

machine-learning

mathematical-optimization

data-mining

gradient-descent

newtons-method

London guy

People also ask

2 Answers

Florian Brucker

dashnick

Recent Activity

Donate For Us