I am trying to understand how the "dogleg" method works in Python's scipy.optimize.minimize
function. I am adapting the example at the bottom of the help page.
The dogleg method requires a Jacobian and Hessian argument according to the notes. For this I use the numdifftools
package:
import numpy as np
from scipy.optimize import minimize
from numdifftools import Jacobian, Hessian
def fun(x,a):
return (x[0] - 1)**2 + (x[1] - a)**2
x0 = np.array([2,0]) # initial guess
a = 2.5
res = minimize(fun, x0, args=(a), method='dogleg',
jac=Jacobian(fun)([2,0]), hess=Hessian(fun)([2,0]))
print(res)
Edit:
If I make a change as suggested by a post below,
res = minimize(fun, x0, args=a, method='dogleg',
jac=Jacobian(lambda x: fun(x,a)),
hess=Hessian(lambda x: fun(x,a)))
I get an error TypeError: <lambda>() takes 1 positional argument but 2 were given
. What am I doing wrong?
Also is it correct to calculate the Jacobian and Hessian at the initial guess x0
?
optimize. minimize can be terminated by using tol and maxiter (maxfev also for some optimization methods). There are also some method-specific terminators like xtol, ftol, gtol, etc., as mentioned on scipy.
I get that this is a toy example, but I would like to point out that using a tool like Jacobian
or Hessian
to calculate the derivatives instead of deriving the function itself is fairly costly. For example with your method:
x0 = np.array([2, 0])
a = 2.5
%timeit minimize(fun, x0, args=(a,), method='dogleg', jac=fun_der, hess=fun_hess)
100 loops, best of 3: 13.6 ms per loop
But you could calculate the derivative functions as such:
def fun_der(x, a):
dx = 2 * (x[0] - 1)
dy = 2 * (x[1] - a)
return np.array([dx, dy]
def fun_hess(x, a):
dx = 2
dy = 2
return np.diag([dx, dy])
%timeit minimize(fun, x0, args=(a,), method='dogleg', jac=fun_der, hess=fun_hess)
1000 loops, best of 3: 279 µs per loop
As you can see that is almost 50x faster. It really starts to add up with complex functions. As such I always try to derive the functions explicitly myself, regardless of how difficult that may be. One fun example is the kernel based implementation of Inductive Matrix Completion.
argmin --> sum((A - gamma_u(X) Z gamma_v(Y))**2 - lambda * ||Z||**2)
where gamma_u = (1/sqrt(m_x)) * [cos(UX), sin(UX)] and
gamma_v = (1/sqrt(m_y)) * [cos(VY), sin(VY)]
X.shape = n_x, p; Y.shape = n_y, q; U.shape = m_x, p; V.shape = m_y, q; Z.shape = 2m_x, 2m_y
Calculating the gradient and hessian from this equation is extremely unreasonable in comparison to explicitly deriving and utilizing those functions. So as @bnaul pointed out, if your function does have closed form derivates you really do want to calculate and use them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With