I want to implement Coordinate Descent in Python and compare the result with that of Gradient Descent. I wrote the code. But it does not work well. GD is maybe ok, but CD is not good.
This is the reference to Coordinate Descent. --- LINK
And I expected this result Gradient Descent vs Coordinate Descent
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(10, 100, 1000)
y = np.linspace(10, 100, 1000)
def func(x, y, param):
return param[0] * x + param[1] * y
def costf(X, y, param):
return np.sum((X.dot(param) - y) ** 2)/(2 * len(y))
z = func(x, y, [5, 8]) + np.random.normal(0., 10.)
z = z.reshape(-1, 1)
interc = np.ones(1000)
X = np.concatenate([interc.reshape(-1, 1), x.reshape(-1, 1), y.reshape(-1, 1)], axis=1)
#param = np.random.randn(3).T
param = np.array([2, 2, 2])
def gradient_descent(X, y, param, eta=0.01, iter=100):
cost_history = [0] * iter
for iteration in range(iter):
h = X.dot(param)
loss = h - y
gradient = X.T.dot(loss)/(2 * len(y))
param = param - eta * gradient
cost = costf(X, y, param)
#print(cost)
cost_history[iteration] = cost
return param, cost_history
def coordinate_descent(X, y, param, iter=100):
cost_history = [0] * iter
for iteration in range(iter):
for i in range(len(param)):
dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))
cost = costf(X, y, param)
#print(cost)
cost_history[iteration] = cost
return param, cost_history
ret, xret = gradient_descent(X, z, param)
cret, cxret = coordinate_descent(X, z, param)
plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()
plt.show()
Thank you in advance.
I'm not an expert in optimaztion. Here's only some advise.
Your code looks pretty good to me, except that
1.the following two lines seems wrong
loss = h - y
param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))
You reshaped y to (n_samples, 1) with z = z.reshape(-1, 1), while h and dele has shape (n_samples,). This results a (n_samples, n_samples) matrix which I think is not what you wanted.
2.your data might not be that good for test purpose. It doesn't mean that it won't work but you may need some extract effort to tune the learning rate / number of iterations.
I tried your code on boston dataset, the output looks like this.

the code
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
# boston house-prices dataset
data = load_boston()
X, y = data.data, data.target
X = StandardScaler().fit_transform(X) # for easy convergence
X = np.hstack([X, np.ones((X.shape[0], 1))])
param = np.zeros(X.shape[1])
def gradient_descent(X, y, param, eta=0.01, iter=300):
cost_history = [0] * (iter+1)
cost_history[0] = costf(X, y, param) # you may want to save initial cost
for iteration in range(iter):
h = X.dot(param)
loss = h - y.ravel()
gradient = X.T.dot(loss)/(2 * len(y))
param = param - eta * gradient
cost = costf(X, y, param)
#print(cost)
cost_history[iteration+1] = cost
return param, cost_history
def coordinate_descent(X, y, param, iter=300):
cost_history = [0] * (iter+1)
cost_history[0] = costf(X, y, param)
for iteration in range(iter):
for i in range(len(param)):
dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
param[i] = np.dot(X[:,i].T, (y.ravel() - dele))/np.sum(np.square(X[:,i]))
cost = costf(X, y, param)
cost_history[iteration+1] = cost
return param, cost_history
ret, xret = gradient_descent(X, y, param)
cret, cxret = coordinate_descent(X, y, param.copy())
plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()
If this is what you want, you can then try to test on some other data. Just be careful that even the implementation is right, you still need to tune hyperparamters to get good convergence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With