I'm trying to write out a bit of code for the gradient descent algorithm explained in the Stanford Machine Learning lecture (lecture 2 at around 25:00). Below is the implementation I used at first, and I think it's properly copied over from the lecture, but it doesn't converge when I add large numbers (<code>>8</code>) to the training set. I'm inputting a number <code>X</code>, and the <code>point (X,X)</code> is added to the training set, so at the moment, I'm only trying to get it to converge to <code>y=ax+b</code> where <code>a=1=theta\[1\]</code> and <code>b=0=theta\[0\]</code>. The training set is the array <code>x</code> and <code>y</code>, where <code>(x[i],y[i])</code> is a point. <pre class="prettyprint"><code>void train() { double delta; for (int i = 0; i < x.size(); i++) { delta = y[i]-hypothesis(x[i]); theta[1] += alpha*delta*x[i]; theta[0] += alpha*delta*1; } } void C_Approx::display() { std::cout<<theta[1]<<"x + "<<theta[0]<<" \t "<<"f(x)="<<hypothesis(1)<<std::endl; } </code></pre> some of the results I'm getting: I input a number, it runs <code>train()</code> a few times, then <code>display()</code> <pre class="prettyprint"><code>1 0.33616x + 0.33616 f(x)=0.67232 1 0.482408x + 0.482408 f(x)=0.964816 1 0.499381x + 0.499381 f(x)=0.998762 1 0.499993x + 0.499993 f(x)=0.999986 1 0.5x + 0.5 f(x)=1 </code></pre> An example of it diverging after it passed <code>8</code>: <pre class="prettyprint"><code>1 0.33616x + 0.33616 f(x)=0.67232 2 0.705508x + 0.509914 f(x)=1.21542 3 0.850024x + 0.449928 f(x)=1.29995 4 0.936062x + 0.330346 f(x)=1.26641 5 0.951346x + 0.231295 f(x)=1.18264 6 0.992876x + 0.137739 f(x)=1.13062 7 0.932206x + 0.127372 f(x)=1.05958 8 1.00077x + 0.000493063 f(x)=1.00126 9 -0.689325x + -0.0714712 f(x)=-0.760797 10 4.10321e+08x + 4.365e+07 f(x)=4.53971e+08 11 1.79968e+22x + 1.61125e+21 f(x)=1.9608e+22 12 -3.9452e+41x + -3.26957e+40 f(x)=-4.27216e+41 </code></pre> I tried the solution proposed here of scaling the step and ended up with similar results. What am I doing wrong?

I have experienced the same problem (albeit in Java) because my learning rate was too big. For short, I was using <code>α = 0.001</code> and I had to push it to <code>0.000001</code> to see actual convergence. Of course these values are linked to your dataset.

Gradient descent algorithm won't converge

I'm trying to write out a bit of code for the gradient descent algorithm explained in the Stanford Machine Learning lecture (lecture 2 at around 25:00). Below is the implementation I used at first, and I think it's properly copied over from the lecture, but it doesn't converge when I add large numbers (>8) to the training set.

I'm inputting a number X, and the point (X,X) is added to the training set, so at the moment, I'm only trying to get it to converge to y=ax+b where a=1=theta\[1\] and b=0=theta\[0\]. The training set is the array x and y, where (x[i],y[i]) is a point.

void train()
{
    double delta;
    for (int i = 0; i < x.size(); i++)
    {
        delta = y[i]-hypothesis(x[i]);
        theta[1] += alpha*delta*x[i];
        theta[0] += alpha*delta*1;
    }
}

void C_Approx::display()
{
    std::cout<<theta[1]<<"x + "<<theta[0]<<" \t "<<"f(x)="<<hypothesis(1)<<std::endl;
}

some of the results I'm getting: I input a number, it runs train() a few times, then display()

1
0.33616x + 0.33616   f(x)=0.67232
1
0.482408x + 0.482408     f(x)=0.964816
1
0.499381x + 0.499381     f(x)=0.998762
1
0.499993x + 0.499993     f(x)=0.999986
1
0.5x + 0.5   f(x)=1

An example of it diverging after it passed 8:

1
0.33616x + 0.33616   f(x)=0.67232
2
0.705508x + 0.509914     f(x)=1.21542
3
0.850024x + 0.449928     f(x)=1.29995
4
0.936062x + 0.330346     f(x)=1.26641
5
0.951346x + 0.231295     f(x)=1.18264
6
0.992876x + 0.137739     f(x)=1.13062
7
0.932206x + 0.127372     f(x)=1.05958
8
1.00077x + 0.000493063   f(x)=1.00126
9
-0.689325x + -0.0714712      f(x)=-0.760797
10
4.10321e+08x + 4.365e+07     f(x)=4.53971e+08
11
1.79968e+22x + 1.61125e+21   f(x)=1.9608e+22
12
-3.9452e+41x + -3.26957e+40      f(x)=-4.27216e+41

I tried the solution proposed here of scaling the step and ended up with similar results. What am I doing wrong?

Why does gradient descent not converge?

If the learning rate is too small, the descent will be small and hence there will be a delayed or no convergence on the other hand if the learning rate is too large, then gradient descent will overshoot the minimum point and will ultimately fail to converge.

Is gradient descent guaranteed to converge?

Intuitively, this means that gradient descent is guaranteed to converge and that it converges with rate O(1/k). value strictly decreases with each iteration of gradient descent until it reaches the optimal value f(x) = f(x∗).

Under what conditions does gradient descent converge?

We see above that gradient descent can reduce the cost function, and can converge when it reaches a point where the gradient of the cost function is zero.

Will gradient descent methods always converge to the same point?

No, they always don't. That's because in some cases it reaches a local minima or a local optima point.

Your implementation is good. Generally, stochastic gradient descent might diverge when α is too large. What you would do with a large dataset is take a reasonably sized random sample, find α that gives you the best results, and then use it for the rest.

I have experienced the same problem (albeit in Java) because my learning rate was too big.
For short, I was using α = 0.001 and I had to push it to 0.000001 to see actual convergence.

Of course these values are linked to your dataset.

Gradient descent algorithm won't converge

Tags:

c++

machine-learning

linear-regression

howardh

People also ask

2 Answers

Don Reba

MonoThreaded

Recent Activity

Donate For Us

Gradient descent algorithm won't converge

Tags:

c++

machine-learning

linear-regression

howardh

People also ask

2 Answers

Don Reba

MonoThreaded

Related questions

Recent Activity

Donate For Us