Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi variable gradient descent

I am learning gradient descent for calculating coefficients. Below is what I am doing:

#!/usr/bin/Python

 import numpy as np


   # m denotes the number of examples here, not the number of features
 def gradientDescent(x, y, theta, alpha, m, numIterations):
     xTrans = x.transpose()
     for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        #print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
     return theta

 X = np.array([41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8])
 y = np.array([251.3,251.3,248.3,267.5,273.0,276.5,270.3,274.9,285.0,290.0,297.0,302.5,304.5,309.3,321.7,330.7,349.0])
 n = np.max(X.shape)
 x = np.vstack([np.ones(n), X]).T      
 m, n = np.shape(x)
 numIterations= 100000
 alpha = 0.0005
 theta = np.ones(n)
 theta = gradientDescent(x, y, theta, alpha, m, numIterations)
 print(theta)

Now my above code works fine. If I now try multiple variables and replace X with X1 like the following:

  X1 = np.array([[41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8], [29.1,29.3,29.5,29.7,29.9,30.3,30.5,30.7,30.8,30.9,31.5,31.7,31.9,32.0,32.1,32.5,32.9]])

then my code fails and shows me the following error:

  JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
  cost = np.sum(loss ** 2) / (2 * m)
  JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
  theta = theta - alpha * gradient
  [ nan  nan  nan]

Can anybody tell me how can I do gradient descent using X1? My expected output using X1 is:

[-153.5 1.24 12.08]

I am open to other Python implementations also. I just want the coefficients (also called thetas) for X1 and y.

like image 856
user227666 Avatar asked Oct 20 '22 06:10

user227666


1 Answers

The problem is in your algorithm not converging. It diverges instead. The first error:

JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)

comes from the problem that at some point calculating the square of something is impossible, as the 64-bit floats cannot hold the number (i.e. it is > 10^309).

JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient

This is only a consequence of the error before. The numbers are not reasonable for calculations.

You can actually see the divergence by uncommenting your debug print line. The cost starts to grow, as there is no convergence.

If you try your function with X1 and a smaller value for alpha, it converges.

like image 88
DrV Avatar answered Oct 24 '22 11:10

DrV