Coordinate descent in Python [closed]

Question

I want to implement Coordinate Descent in Python and compare the result with that of Gradient Descent. I wrote the code. But it does not work well. GD is maybe ok, but CD is not good.

This is the reference to Coordinate Descent. --- LINK

And I expected this result Gradient Descent vs Coordinate Descent

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(10, 100, 1000)
y = np.linspace(10, 100, 1000)

def func(x, y, param):
    return param[0] * x + param[1] * y

def costf(X, y, param):
    return np.sum((X.dot(param) - y) ** 2)/(2 * len(y))

z = func(x, y, [5, 8]) + np.random.normal(0., 10.)
z = z.reshape(-1, 1)

interc = np.ones(1000)
X = np.concatenate([interc.reshape(-1, 1), x.reshape(-1, 1), y.reshape(-1, 1)], axis=1)

#param = np.random.randn(3).T
param = np.array([2, 2, 2])

def gradient_descent(X, y, param, eta=0.01, iter=100):
    cost_history = [0] * iter

    for iteration in range(iter):
        h = X.dot(param)
        loss = h - y
        gradient = X.T.dot(loss)/(2 * len(y))
        param = param - eta * gradient
        cost = costf(X, y, param)
        #print(cost)
        cost_history[iteration] = cost

    return param, cost_history


def coordinate_descent(X, y, param, iter=100):
    cost_history = [0] * iter

    for iteration in range(iter):
        for i in range(len(param)):
            dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
            param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))
            cost = costf(X, y, param)
            #print(cost)
            cost_history[iteration] = cost

    return param, cost_history


ret, xret = gradient_descent(X, z, param)
cret, cxret = coordinate_descent(X, z, param)

plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()
plt.show()

Thank you in advance.

Sacry · Accepted Answer

I'm not an expert in optimaztion. Here's only some advise.

Your code looks pretty good to me, except that

1.the following two lines seems wrong

loss = h - y

param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))

You reshaped y to (n_samples, 1) with z = z.reshape(-1, 1), while h and dele has shape (n_samples,). This results a (n_samples, n_samples) matrix which I think is not what you wanted.

2.your data might not be that good for test purpose. It doesn't mean that it won't work but you may need some extract effort to tune the learning rate / number of iterations.

I tried your code on boston dataset, the output looks like this.

enter image description here

the code

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler

#  boston house-prices dataset
data = load_boston()
X, y = data.data, data.target

X = StandardScaler().fit_transform(X)  # for easy convergence
X = np.hstack([X, np.ones((X.shape[0], 1))])

param = np.zeros(X.shape[1])

def gradient_descent(X, y, param, eta=0.01, iter=300):
    cost_history = [0] * (iter+1)
    cost_history[0] = costf(X, y, param)  # you may want to save initial cost

    for iteration in range(iter):
        h = X.dot(param)
        loss = h - y.ravel()
        gradient = X.T.dot(loss)/(2 * len(y))
        param = param - eta * gradient
        cost = costf(X, y, param)
        #print(cost)
        cost_history[iteration+1] = cost

    return param, cost_history

def coordinate_descent(X, y, param, iter=300):
    cost_history = [0] * (iter+1)
    cost_history[0] = costf(X, y, param)

    for iteration in range(iter):
        for i in range(len(param)):
            dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
            param[i] = np.dot(X[:,i].T, (y.ravel() - dele))/np.sum(np.square(X[:,i]))
            cost = costf(X, y, param)
            cost_history[iteration+1] = cost

    return param, cost_history

ret, xret = gradient_descent(X, y, param)
cret, cxret = coordinate_descent(X, y, param.copy())

plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()

If this is what you want, you can then try to test on some other data. Just be careful that even the implementation is right, you still need to tune hyperparamters to get good convergence.

Coordinate descent in Python [closed]

Tags:

python

optimization

python-3.x

gradient-descent

Jorns

1 Answers

Sacry

Recent Activity

Donate For Us

Coordinate descent in Python [closed]

Tags:

python

optimization

python-3.x

gradient-descent

Jorns

1 Answers

Sacry

Related questions

Recent Activity

Donate For Us