Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy.optimize solution using python for the following equation

I am very new to scipy and doing data analysis in python. I am trying to solve the following regularized optimization problem and unfortunately I haven't been able to make too much sense from the scipy documentation. I am looking to solve the following constrained optimization problem using scipy.optimize

Here is the function I am looking to minimize:

enter image description here

here A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.

In the function A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second term is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.

I would like to know how to minimize this function subject to the constraints that:

wj  >= 0    
wj,j = 0

I would like to use coordinate descent (or any other method that scipy.optimize provides) to solve the above problem. I would like so direction on how to achieve this as I have no idea how to take the frobenius norm or how to tune the parameters beta and lambda or whether the scipy.optimize will tune and return the parameters for me. Any help regarding these questions would be much appreciated.

Thanks in advance!

like image 274
anonuser0428 Avatar asked Nov 19 '13 03:11

anonuser0428


2 Answers

How large is m and n?

Here is a basic example for how to use fmin:

from scipy import optimize
import numpy as np

m = 5
n = 3

a = np.random.rand(m, n)
idx = np.arange(n)

def func(w, beta, lam):
    w = w.reshape(n, n)
    w2 = np.abs(w)
    w2[idx, idx] = 0
    return 0.5*((a - np.dot(a, w2))**2).sum() + lam*w2.sum() + 0.5*beta*(w2**2).sum()

w = optimize.fmin(func, np.random.rand(n*n), args=(0.1, 0.2))
w = w.reshape(n, n)
w[idx, idx] = 0
w = np.abs(w)
print w

If you want to use coordinate descent, you can implement it by theano.

http://deeplearning.net/software/theano/

like image 86
HYRY Avatar answered Nov 14 '22 22:11

HYRY


Your problem seems tailor-made for cvxopt - http://cvxopt.org/ and in particular http://cvxopt.org/userguide/solvers.html#problems-with-nonlinear-objectives

using fmin would likely be slower, since it does not take advantage of gradient / Hessian information.

The code in HYRY's answer also has the drawback that as far as fmin is concerned the diagonal W is a variable and fmin would try to move the W-diagonal values around until it realizes that they don't do anything (since the objective function resets them to zero). Here is the implementation in cvxopt of HYRY's code that explicitly enforces the zero-constraints and uses gradient info, WARNING: I couldn't derive the Hessian for your objective... and you might double-check the gradient as well:

'''CVXOPT version:'''


from numpy import *
from cvxopt import matrix, mul
''' warning: CVXOPT uses column-major order (Fortran) '''

m = 5
n = 3
n_active = (n)*(n-1)

A = matrix(random.rand(m*n),(m,n))

ids = arange(n)

beta = 0.1;
lam = 0.2;
W = matrix(zeros(n*n), (n,n));
def cvx_objective_func(w=None, z=None):
    if w is None:
        num_nonlinear_constraints = 0;
        w_0 = matrix(1, (n_active,1), 'd');
        return num_nonlinear_constraints, w_0
    #main call: 
    'calculate objective:'
    'form W matrix, warning _w is column-major order (Fortran)'
    '''column-major order!'''
    _w = matrix(w, (n, n-1))
    for k in xrange(n):
        W[k, 0:k]   = _w[k, 0:k] 
        W[k, k+1:n] = _w[k, k:n-1] 
    squared_error = A - A*W
    objective_value = .5 * sum( mul(squared_error,squared_error)) +\
                       .5* beta*sum(mul(W,W)) +\
                           lam * sum(abs(W)); 
    'not sure if i calculated this right...'
    _Df = -A.T*(squared_error) + beta*W + lam;

    '''column-major order!'''
    Df = matrix(0., (1, n*(n-1))) 
    for jdx in arange(n):
        for idx in list(arange(0,jdx)) + list(arange(jdx+1,n)):
            idx = int(idx);
            jdx = int(jdx)
            Df[0, jdx*(n-1) + idx] = _Df[idx, jdx]
    if z is None:
        return objective_value, Df
    '''Also form hessian of objective+non-linear constraints
     (but there are no nonlinear constraints) : 
     This is the trickiest part...
     WARNING: H is for sure coded wrong'''
    H = matrix(1., (n_active, n_active))  

    return objective_value, Df, H

m, w_0 = cvx_objective_func()
print cvx_objective_func(w_0)

G = -matrix(diag(ones(n_active),), (n_active,n_active))
h = matrix(0., (n_active,1), 'd')
from cvxopt import solvers
print solvers.cp(cvx_objective_func, G=G, h=h)

having said that, the tricks to eliminate the equality/inequality constraints in HYRY's code are quite cute

like image 38
alexandre iolov Avatar answered Nov 15 '22 00:11

alexandre iolov