How to create a simple Gradient Descent algorithm

Tags:

machine-learning

I'm studying simple machine learning algorithms, beginning with a simple gradient descent, but I've got some trouble trying to implement it in python.

Here is the example I'm trying to reproduce, I've got data about houses with the (living area (in feet2), and number of bedrooms) with the resulting price :

Living area (feet2) : 2104

#bedrooms : 3

Price (1000$s) : 400

I'm trying to do a simple regression using the gradient descent method, but my algorithm won't work... The form of the algorithm is not using vectors on purpose (I'm trying to understand it step by step).

i = 1
import sys
derror=sys.maxint
error = 0
step = 0.0001
dthresh = 0.1
import random

theta1 = random.random()
theta2 = random.random()
theta0 = random.random()
while derror>dthresh:
    diff = 400 - theta0 - 2104 * theta1 - 3 * theta2
    theta0 = theta0 + step * diff * 1
    theta1 = theta1 + step * diff * 2104
    theta2 = theta2 + step * diff * 3
    hserror = diff**2/2
    derror = abs(error - hserror)
    error = hserror
    print 'iteration : %d, error : %s' % (i, error)
    i+=1

I understand the math, I'm constructing a predicting function $h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2$ with $x_1$ and $x_2$ being the variables (living area, number of bedrooms) and $h_{\theta}(x)$ the estimated price.

I'm using the cost function ( $hserror$ ) (for one point) : $hserror = \frac{1}{2} (h_{\theta}(x) - y)^2$ This is a usual problem, but I'm more of a software engineer and I'm learning one step at a time, can you tell me what's wrong ?

I got it working with this code :

data = {(2104, 3) : 400, (1600,3) : 330, (2400, 3) : 369, (1416, 2) : 232, (3000, 4) : 540}
for x in range(10):
    i = 1
    import sys
    derror=sys.maxint
    error = 0
    step = 0.00000001
    dthresh = 0.0000000001
    import random

    theta1 = random.random()*100
    theta2 = random.random()*100
    theta0 = random.random()*100
    while derror>dthresh:
        diff = 400 - (theta0 + 2104 * theta1 + 3 * theta2)
        theta0 = theta0 + step * diff * 1
        theta1 = theta1 + step * diff * 2104
        theta2 = theta2 + step * diff * 3
        hserror = diff**2/2
        derror = abs(error - hserror)
        error = hserror
        #print 'iteration : %d, error : %s, derror : %s' % (i, error, derror)
        i+=1
    print ' theta0 : %f, theta1 : %f, theta2 : %f' % (theta0, theta1, theta2)
    print ' done : %f' %(theta0 + 2104 * theta1 + 3*theta2)

which ends up with answers like this :

 theta0 : 48.412337, theta1 : 0.094492, theta2 : 50.925579
 done : 400.000043
 theta0 : 0.574007, theta1 : 0.185363, theta2 : 3.140553
 done : 400.000042
 theta0 : 28.588457, theta1 : 0.041746, theta2 : 94.525769
 done : 400.000043
 theta0 : 42.240593, theta1 : 0.096398, theta2 : 51.645989
 done : 400.000043
 theta0 : 98.452431, theta1 : 0.136432, theta2 : 4.831866
 done : 400.000043
 theta0 : 18.022160, theta1 : 0.148059, theta2 : 23.487524
 done : 400.000043
 theta0 : 39.461977, theta1 : 0.097899, theta2 : 51.519412
 done : 400.000042
 theta0 : 40.979868, theta1 : 0.040312, theta2 : 91.401406
 done : 400.000043
 theta0 : 15.466259, theta1 : 0.111276, theta2 : 50.136221
 done : 400.000043
 theta0 : 72.380926, theta1 : 0.013814, theta2 : 99.517853
 done : 400.000043

282

asked Oct 01 '10 08:10

Olivier Girardot

1 Answers

First issue is that running this with only one piece of data gives you an underdetermined system... this means it may have an infinite number of solutions. With three variables, you'd expect to have at least 3 data points, preferably much higher.

Secondly using gradient descent where the step size is a scaled version of the gradient is not guaranteed to converge except in a small neighbourhood of the solution. You can fix that by switching to either a fixed size step in the direction of the negative gradient (slow) or a linesearch in the direction of the negative gradient ( faster, but slightly more complicated)

So for fixed step size instead of

theta0 = theta0 - step * dEdtheta0
theta1 = theta1 - step * dEdtheta1
theta2 = theta2 - step * dEdtheta2

You do this

n = max( [ dEdtheta1, dEdtheta1, dEdtheta2 ] )    
theta0 = theta0 - step * dEdtheta0 / n
theta1 = theta1 - step * dEdtheta1 / n
theta2 = theta2 - step * dEdtheta2 / n

It also looks like you may have a sign error in your steps.

I'm also not sure that derror is a good stopping criteria. (But stopping criteria are notoriously hard to get "right")

My final point is that gradient descent is horribly slow for parameter fitting. You probably want to use conjugate-gradient or Levenberg-Marquadt methods instead. I suspect that both of these methods already exist for python in the numpy or scipy packages (which aren't part of python by default but are pretty easy to install)

113

answered Oct 12 '22 12:10

Michael Anderson

Related questions
                            
                                Why can't I use 'django-admin.py makemessages -l cn'
                            
                                Python - letter frequency count and translation
                            
                                How can I get the text between tags using python SAX parser?
                            
                                Is line wrap comment possible in Python?
                            
                                Where can I find PythonMagick documentation?
                            
                                Use BeautifulSoup to extract sibling nodes between two nodes
                            
                                Structure accessible by attribute name or index options
                            
                                pyQt4: QWidget subclass not responding to new setStyleSheet() background colour
                            
                                Creating multiple csv files from data within a csv file
                            
                                What are the elegant ways to do MixIns in Python?
                            
                                Python, unittest: Can one make the TestRunner completely quiet?
                            
                                How do I force matplotlib to write out the full form of the x-axis label, avoiding scientific notation?
                            
                                List all currently open file handles? [duplicate]
                            
                                How do I fix a "JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)"?
                            
                                pixelwise drawing in pyglet (python)
                            
                                Python dictionary reference to neighbor dictionary element
                            
                                programmatically converting kml to image [closed]
                            
                                Read EXE, MSI, and ZIP file metadata in Python in Linux
                            
                                Using Relative Paths To Log Files In Pylons' development.ini
                            
                                how to do scatter/gather operations in numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With