Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jacobian and Hessian inputs in `scipy.optimize.minimize`

I am trying to understand how the "dogleg" method works in Python's scipy.optimize.minimize function. I am adapting the example at the bottom of the help page.

The dogleg method requires a Jacobian and Hessian argument according to the notes. For this I use the numdifftools package:

import numpy as np
from scipy.optimize import minimize
from numdifftools import Jacobian, Hessian

def fun(x,a):
    return (x[0] - 1)**2 + (x[1] - a)**2

x0 = np.array([2,0]) # initial guess
a = 2.5

res = minimize(fun, x0, args=(a), method='dogleg',
               jac=Jacobian(fun)([2,0]), hess=Hessian(fun)([2,0]))

print(res)

Edit:

If I make a change as suggested by a post below,

res = minimize(fun, x0, args=a, method='dogleg',
               jac=Jacobian(lambda x: fun(x,a)),
               hess=Hessian(lambda x: fun(x,a)))

I get an error TypeError: <lambda>() takes 1 positional argument but 2 were given. What am I doing wrong?

Also is it correct to calculate the Jacobian and Hessian at the initial guess x0?

like image 359
Medulla Oblongata Avatar asked Dec 14 '16 07:12

Medulla Oblongata


People also ask

How do I stop SciPy optimizing minimize?

optimize. minimize can be terminated by using tol and maxiter (maxfev also for some optimization methods). There are also some method-specific terminators like xtol, ftol, gtol, etc., as mentioned on scipy.


1 Answers

I get that this is a toy example, but I would like to point out that using a tool like Jacobian or Hessian to calculate the derivatives instead of deriving the function itself is fairly costly. For example with your method:

x0 = np.array([2, 0])
a = 2.5
%timeit minimize(fun, x0, args=(a,), method='dogleg', jac=fun_der, hess=fun_hess)
100 loops, best of 3: 13.6 ms per loop

But you could calculate the derivative functions as such:

def fun_der(x, a):
    dx = 2 * (x[0] - 1)
    dy = 2 * (x[1] - a)
    return np.array([dx, dy]

def fun_hess(x, a):
    dx = 2
    dy = 2
    return np.diag([dx, dy])

%timeit minimize(fun, x0, args=(a,), method='dogleg', jac=fun_der, hess=fun_hess)
1000 loops, best of 3: 279 µs per loop

As you can see that is almost 50x faster. It really starts to add up with complex functions. As such I always try to derive the functions explicitly myself, regardless of how difficult that may be. One fun example is the kernel based implementation of Inductive Matrix Completion.

argmin --> sum((A - gamma_u(X) Z gamma_v(Y))**2 - lambda * ||Z||**2)
where gamma_u = (1/sqrt(m_x)) * [cos(UX), sin(UX)] and
gamma_v = (1/sqrt(m_y)) * [cos(VY), sin(VY)]
X.shape = n_x, p; Y.shape = n_y, q; U.shape = m_x, p; V.shape = m_y, q; Z.shape = 2m_x, 2m_y

Calculating the gradient and hessian from this equation is extremely unreasonable in comparison to explicitly deriving and utilizing those functions. So as @bnaul pointed out, if your function does have closed form derivates you really do want to calculate and use them.

like image 57
Grr Avatar answered Oct 21 '22 06:10

Grr