I'm studying simple machine learning algorithms, beginning with a simple gradient descent, but I've got some trouble trying to implement it in python.
Here is the example I'm trying to reproduce, I've got data about houses with the (living area (in feet2), and number of bedrooms) with the resulting price :
Living area (feet2) : 2104
#bedrooms : 3
Price (1000$s) : 400
I'm trying to do a simple regression using the gradient descent method, but my algorithm won't work... The form of the algorithm is not using vectors on purpose (I'm trying to understand it step by step).
i = 1
import sys
derror=sys.maxint
error = 0
step = 0.0001
dthresh = 0.1
import random
theta1 = random.random()
theta2 = random.random()
theta0 = random.random()
while derror>dthresh:
diff = 400 - theta0 - 2104 * theta1 - 3 * theta2
theta0 = theta0 + step * diff * 1
theta1 = theta1 + step * diff * 2104
theta2 = theta2 + step * diff * 3
hserror = diff**2/2
derror = abs(error - hserror)
error = hserror
print 'iteration : %d, error : %s' % (i, error)
i+=1
I understand the math, I'm constructing a predicting function with and being the variables (living area, number of bedrooms) and the estimated price.
I'm using the cost function ( ) (for one point) : This is a usual problem, but I'm more of a software engineer and I'm learning one step at a time, can you tell me what's wrong ?
I got it working with this code :
data = {(2104, 3) : 400, (1600,3) : 330, (2400, 3) : 369, (1416, 2) : 232, (3000, 4) : 540}
for x in range(10):
i = 1
import sys
derror=sys.maxint
error = 0
step = 0.00000001
dthresh = 0.0000000001
import random
theta1 = random.random()*100
theta2 = random.random()*100
theta0 = random.random()*100
while derror>dthresh:
diff = 400 - (theta0 + 2104 * theta1 + 3 * theta2)
theta0 = theta0 + step * diff * 1
theta1 = theta1 + step * diff * 2104
theta2 = theta2 + step * diff * 3
hserror = diff**2/2
derror = abs(error - hserror)
error = hserror
#print 'iteration : %d, error : %s, derror : %s' % (i, error, derror)
i+=1
print ' theta0 : %f, theta1 : %f, theta2 : %f' % (theta0, theta1, theta2)
print ' done : %f' %(theta0 + 2104 * theta1 + 3*theta2)
which ends up with answers like this :
theta0 : 48.412337, theta1 : 0.094492, theta2 : 50.925579
done : 400.000043
theta0 : 0.574007, theta1 : 0.185363, theta2 : 3.140553
done : 400.000042
theta0 : 28.588457, theta1 : 0.041746, theta2 : 94.525769
done : 400.000043
theta0 : 42.240593, theta1 : 0.096398, theta2 : 51.645989
done : 400.000043
theta0 : 98.452431, theta1 : 0.136432, theta2 : 4.831866
done : 400.000043
theta0 : 18.022160, theta1 : 0.148059, theta2 : 23.487524
done : 400.000043
theta0 : 39.461977, theta1 : 0.097899, theta2 : 51.519412
done : 400.000042
theta0 : 40.979868, theta1 : 0.040312, theta2 : 91.401406
done : 400.000043
theta0 : 15.466259, theta1 : 0.111276, theta2 : 50.136221
done : 400.000043
theta0 : 72.380926, theta1 : 0.013814, theta2 : 99.517853
done : 400.000043
To achieve this goal, it performs two steps iteratively: Compute the gradient (slope), the first order derivative of the function at that point. Make a step (move) in the direction opposite to the gradient, opposite direction of slope increase from the current point by alpha times the gradient at that point.
The gradient descent procedure is an algorithm for finding the minimum of a function. Suppose we have a function f(x), where x is a tuple of several variables,i.e., x = (x_1, x_2, … x_n). Also, suppose that the gradient of f(x) is given by ∇f(x). We want to find the value of the variables (x_1, x_2, …
How to calculate Gradient Descent? In order to find the gradient of the function with respect to x dimension, take the derivative of the function with respect to x , then substitute the x-coordinate of the point of interest in for the x values in the derivative.
Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks. Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates.
First issue is that running this with only one piece of data gives you an underdetermined system... this means it may have an infinite number of solutions. With three variables, you'd expect to have at least 3 data points, preferably much higher.
Secondly using gradient descent where the step size is a scaled version of the gradient is not guaranteed to converge except in a small neighbourhood of the solution. You can fix that by switching to either a fixed size step in the direction of the negative gradient (slow) or a linesearch in the direction of the negative gradient ( faster, but slightly more complicated)
So for fixed step size instead of
theta0 = theta0 - step * dEdtheta0
theta1 = theta1 - step * dEdtheta1
theta2 = theta2 - step * dEdtheta2
You do this
n = max( [ dEdtheta1, dEdtheta1, dEdtheta2 ] )
theta0 = theta0 - step * dEdtheta0 / n
theta1 = theta1 - step * dEdtheta1 / n
theta2 = theta2 - step * dEdtheta2 / n
It also looks like you may have a sign error in your steps.
I'm also not sure that derror is a good stopping criteria. (But stopping criteria are notoriously hard to get "right")
My final point is that gradient descent is horribly slow for parameter fitting. You probably want to use conjugate-gradient or Levenberg-Marquadt methods instead. I suspect that both of these methods already exist for python in the numpy or scipy packages (which aren't part of python by default but are pretty easy to install)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With