Gradient descent convergence How to decide convergence?

Q: Will gradient descent always converge?

No, they always don't. That's because in some cases it reaches a local minima or a local optima point.

Q: Which gradient descent provides the fastest convergence?

Stochastic Gradient Descent: This is a type of gradient descent which processes 1 training example per iteration. Hence, the parameters are being updated even after one iteration in which only a single example has been processed. Hence this is quite faster than batch gradient descent.

Tags:

machine-learning

gradient-descent

I learnt gradient descent through online resources (namely machine learning at coursera). However the information provided only said to repeat gradient descent until it converges.

Their definition of convergence was to use a graph of the cost function relative to the number of iterations and watch when the graph flattens out. Therefore I assume that I would do the following:

if (change_in_costfunction > precisionvalue) {
          repeat gradient_descent
}

Alternatively, I was wondering if another way to determine convergence is to watch the coefficient approach it's true value:

if (change_in_coefficient_j > precisionvalue) {
          repeat gradient_descent_for_j
} 
...repeat for all coefficients

So is convergence based on the cost function or the coefficients? And how do we determine the precision value? Should it be a % of the coefficient or total cost function?

233

asked Jun 25 '13 04:06

Terence Chow

1 Answers

You can imagine how Gradient Descent (GD) works thinking that you throw marble inside a bowl and you start taking photos. The marble will oscillate till friction will stop it in the bottom. Now imaging that you are in an environment that friction is so small that the marble takes a long time to stop completely, so we can assume that when the oscillations are small enough the marble have reach the bottom (although it could continue oscillating). In the following image you could see the first eight steps (photos of the marble) of the GD.

enter image description here

If we continue taking photos the marble makes not appreciable movements, you should zoom the image:

enter image description here

We could keep taking photos and the movements will be more irrelevants.

So reaching a point in which GD makes very small changes in your objective function is called convergence, which doesn't mean it has reached the optimal result (but it is really quite quite near, if not on it).

The precision value can be chosen as the threshold in which you consecutive iterations of GD are almost the same:

grad(i) = 0.0001
grad(i+1) = 0.000099989 <-- grad has changed less than 0.01% => STOP

answered Oct 18 '22 21:10

jabaldonedo

Related questions
                            
                                Error "Expected 2D array, got 1D array instead" Using OneHotEncoder
                            
                                What's the difference between input_shape and batch_input_shape in LSTM
                            
                                Keras load_model with custom objects doesn't work properly
                            
                                module 'tensorflow.compat.v2.__internal__' has no attribute 'tf2'
                            
                                C/C++ Machine Learning Libraries for Clustering [closed]
                            
                                How to understand the output of Topic Model class in Mallet?
                            
                                "valid deviance" is nan for GBM model, What does this means and how to get rid of this?
                            
                                Pandas DataFrame RangeIndex
                            
                                fisher's linear discriminant in Python
                            
                                Machine learning for weighting adjustment
                            
                                Neural nets as universal approximators
                            
                                AttributeError list object has no attribute add
                            
                                Keras cifar10 example validation and test loss lower than training loss
                            
                                How to do point-wise categorical crossentropy loss in Keras?
                            
                                How to speed-up k-means from Scikit learn?
                            
                                Any python Support Vector Machine library around that allows online learning?
                            
                                Principal Component Analysis in MATLAB
                            
                                Decision Tree Learning and Impurity
                            
                                compare bayesian linear regression VS linear regression [closed]
                            
                                How to group nearby latitude and longitude locations stored in SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With