I have to do Logistic regression using batch gradient descent.
import numpy as np
X = np.asarray([
[0.50],[0.75],[1.00],[1.25],[1.50],[1.75],[1.75],
[2.00],[2.25],[2.50],[2.75],[3.00],[3.25],[3.50],
[4.00],[4.25],[4.50],[4.75],[5.00],[5.50]])
y = np.asarray([0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1])
m = len(X)
def sigmoid(a):
return 1.0 / (1 + np.exp(-a))
def gradient_Descent(theta, alpha, X , y):
for i in range(0,m):
cost = ((-y) * np.log(sigmoid(X[i]))) - ((1 - y) * np.log(1 - sigmoid(X[i])))
grad = theta - alpha * (1.0/m) * (np.dot(cost,X[i]))
theta = theta - alpha * grad
gradient_Descent(0.1,0.005,X,y)
The way I have to do it is like this but I can't seem to understand how to make it work.
Surprisingly, the update rule is the same as the one derived by using the sum of the squared errors in linear regression. As a result, we can use the same gradient descent formula for logistic regression as well.
For logistic regression, there is no longer a closed-form solution, due to the nonlinearity of the logistic sigmoid function.
Gradient Descent need not always converge at global minimum. It all depends on following conditions; The function must be convex function.
Gradient Descent is an iterative process that finds the minima of a function. This is an optimisation algorithm that finds the parameters or coefficients of a function where the function has a minimum value. Although this function does not always guarantee to find a global minimum and can get stuck at a local minimum.
It looks like you have some stuff mixed up in here. It's critical when doing this that you keep track of the shape of your vectors and makes sure you're getting sensible results. For example, you are calculating cost with:
cost = ((-y) * np.log(sigmoid(X[i]))) - ((1 - y) * np.log(1 - sigmoid(X[i])))
In your case y
is vector with 20 items and X[i] is a single value. This makes your cost calculation a 20 item vector which doesn't makes sense. Your cost should be a single value. (you're also calculating this cost a bunch of times for no reason in your gradient descent function).
Also, if you want this to be able to fit your data you need to add a bias terms to X
. So let's start there.
X = np.asarray([
[0.50],[0.75],[1.00],[1.25],[1.50],[1.75],[1.75],
[2.00],[2.25],[2.50],[2.75],[3.00],[3.25],[3.50],
[4.00],[4.25],[4.50],[4.75],[5.00],[5.50]])
ones = np.ones(X.shape)
X = np.hstack([ones, X])
# X.shape is now (20, 2)
Theta will now need 2 values for each X. So initialize that and Y:
Y = np.array([0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]).reshape([-1, 1])
# reshape Y so it's column vector so matrix multiplication is easier
Theta = np.array([[0], [0]])
Your sigmoid function is good. Let's also make a vectorized cost function:
def sigmoid(a):
return 1.0 / (1 + np.exp(-a))
def cost(x, y, theta):
m = x.shape[0]
h = sigmoid(np.matmul(x, theta))
cost = (np.matmul(-y.T, np.log(h)) - np.matmul((1 -y.T), np.log(1 - h)))/m
return cost
The cost function works because Theta
has a shape of (2, 1) and X
has a shape of (20, 2) so matmul(X, Theta)
will be shaped (20, 1). The then matrix multiply the transpose of Y (y.T
shape is (1, 20)), which result in a single value, our cost given a particular value of Theta.
We can then write a function that performs a single step of batch gradient descent:
def gradient_Descent(theta, alpha, x , y):
m = x.shape[0]
h = sigmoid(np.matmul(x, theta))
grad = np.matmul(X.T, (h - y)) / m;
theta = theta - alpha * grad
return theta
Notice np.matmul(X.T, (h - y))
is multiplying shapes (2, 20) and (20, 1) which results in a shape of (2, 1) — the same shape as Theta
, which is what you want from your gradient. This allows you to multiply is by your learning rate and subtract it from the initial Theta, which is what gradient descent is supposed to do.
So now you just write a loop for a number of iterations and update Theta until it looks like it converges:
n_iterations = 500
learning_rate = 0.5
for i in range(n_iterations):
Theta = gradient_Descent(Theta, learning_rate, X, Y)
if i % 50 == 0:
print(cost(X, Y, Theta))
This will print the cost every 50 iterations resulting in a steadily decreasing cost, which is what you hope for:
[[ 0.6410409]]
[[ 0.44766253]]
[[ 0.41593581]]
[[ 0.40697167]]
[[ 0.40377785]]
[[ 0.4024982]]
[[ 0.40195]]
[[ 0.40170533]]
[[ 0.40159325]]
[[ 0.40154101]]
You can try different initial values of Theta
and you will see it always converges to the same thing.
Now you can use your newly found values of Theta
to make predictions:
h = sigmoid(np.matmul(X, Theta))
print((h > .5).astype(int) )
This prints what you would expect for a linear fit to your data:
[[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With