Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coordinate descent in Python [closed]

I want to implement Coordinate Descent in Python and compare the result with that of Gradient Descent. I wrote the code. But it does not work well. GD is maybe ok, but CD is not good.

This is the reference to Coordinate Descent. --- LINK

And I expected this result Gradient Descent vs Coordinate Descent

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(10, 100, 1000)
y = np.linspace(10, 100, 1000)

def func(x, y, param):
    return param[0] * x + param[1] * y

def costf(X, y, param):
    return np.sum((X.dot(param) - y) ** 2)/(2 * len(y))

z = func(x, y, [5, 8]) + np.random.normal(0., 10.)
z = z.reshape(-1, 1)

interc = np.ones(1000)
X = np.concatenate([interc.reshape(-1, 1), x.reshape(-1, 1), y.reshape(-1, 1)], axis=1)

#param = np.random.randn(3).T
param = np.array([2, 2, 2])

def gradient_descent(X, y, param, eta=0.01, iter=100):
    cost_history = [0] * iter

    for iteration in range(iter):
        h = X.dot(param)
        loss = h - y
        gradient = X.T.dot(loss)/(2 * len(y))
        param = param - eta * gradient
        cost = costf(X, y, param)
        #print(cost)
        cost_history[iteration] = cost

    return param, cost_history


def coordinate_descent(X, y, param, iter=100):
    cost_history = [0] * iter

    for iteration in range(iter):
        for i in range(len(param)):
            dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
            param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))
            cost = costf(X, y, param)
            #print(cost)
            cost_history[iteration] = cost

    return param, cost_history


ret, xret = gradient_descent(X, z, param)
cret, cxret = coordinate_descent(X, z, param)

plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()
plt.show()

Thank you in advance.

like image 201
Jorns Avatar asked Dec 07 '25 10:12

Jorns


1 Answers

I'm not an expert in optimaztion. Here's only some advise.

Your code looks pretty good to me, except that

1.the following two lines seems wrong

loss = h - y

param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))

You reshaped y to (n_samples, 1) with z = z.reshape(-1, 1), while h and dele has shape (n_samples,). This results a (n_samples, n_samples) matrix which I think is not what you wanted.

2.your data might not be that good for test purpose. It doesn't mean that it won't work but you may need some extract effort to tune the learning rate / number of iterations.

I tried your code on boston dataset, the output looks like this.

enter image description here

the code

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler

#  boston house-prices dataset
data = load_boston()
X, y = data.data, data.target

X = StandardScaler().fit_transform(X)  # for easy convergence
X = np.hstack([X, np.ones((X.shape[0], 1))])

param = np.zeros(X.shape[1])

def gradient_descent(X, y, param, eta=0.01, iter=300):
    cost_history = [0] * (iter+1)
    cost_history[0] = costf(X, y, param)  # you may want to save initial cost

    for iteration in range(iter):
        h = X.dot(param)
        loss = h - y.ravel()
        gradient = X.T.dot(loss)/(2 * len(y))
        param = param - eta * gradient
        cost = costf(X, y, param)
        #print(cost)
        cost_history[iteration+1] = cost

    return param, cost_history

def coordinate_descent(X, y, param, iter=300):
    cost_history = [0] * (iter+1)
    cost_history[0] = costf(X, y, param)

    for iteration in range(iter):
        for i in range(len(param)):
            dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
            param[i] = np.dot(X[:,i].T, (y.ravel() - dele))/np.sum(np.square(X[:,i]))
            cost = costf(X, y, param)
            cost_history[iteration+1] = cost

    return param, cost_history

ret, xret = gradient_descent(X, y, param)
cret, cxret = coordinate_descent(X, y, param.copy())

plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()

If this is what you want, you can then try to test on some other data. Just be careful that even the implementation is right, you still need to tune hyperparamters to get good convergence.

like image 142
Sacry Avatar answered Dec 10 '25 05:12

Sacry