Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optim.jl: negative inverse Hessian

I am using the Optim.jl library to minimise a function in Julia, using a BFGS algorithm. Today, I have asked a question about the same library, but to avoid confusion I decided to split it in two.

I would like also to get an estimate of the negative inverse Hessian after the optimisation, for further calculations.

In the GitHub website of the Optim library, I found the following working example:

using Optim
rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result        = optimize(rosenbrock, zeros(2), BFGS())

How can I get the negative inverse Hessian after the optimisation from result? Is there any field in it identifying either the Hessian, the Inverse Hessian or the negative inverse Hessian?

EDIT

Thanks for the comments. Do you think that it would be more efficient to edit "optimize.jl" so that the function would return also the inverse Hessian? See below for a working example - the edit has been introduced at line 226:

 if state.method_string == "BFGS"
         return MultivariateOptimizationResults(state.method_string,
                                                initial_x,
                                                f_increased ? state.x_previous : state.x,
                                                f_increased ? state.f_x_previous : state.f_x,
                                                iteration,
                                                iteration == options.iterations,
                                                x_converged,
                                                options.x_tol,
                                                f_converged,
                                                options.f_tol,
                                                g_converged,
                                                options.g_tol,
                                                f_increased,
                                                tr,
                                                state.f_calls,
                                                state.g_calls,
                                                state.h_calls), state.invH
    else
         return MultivariateOptimizationResults(state.method_string,
                                                 initial_x,
                                                 f_increased ? state.x_previous : state.x,
                                                 f_increased ? state.f_x_previous : state.f_x,
                                                 iteration,
                                                 iteration == options.iterations,
                                                 x_converged,
                                                 options.x_tol,
                                                 f_converged,
                                                 options.f_tol,
                                                 g_converged,
                                                 options.g_tol,
                                                 f_increased,
                                                 tr,
                                                 state.f_calls,
                                                 state.g_calls,
                                                 state.h_calls)
     end

Or just:

return MultivariateOptimizationResults(state.method_string,
                                        initial_x,
                                        f_increased ? state.x_previous : state.x,
                                        f_increased ? state.f_x_previous : state.f_x,
                                        iteration,
                                        iteration == options.iterations,
                                        x_converged,
                                        options.x_tol,
                                        f_converged,
                                        options.f_tol,
                                        g_converged,
                                        options.g_tol,
                                        f_increased,
                                        tr,
                                        state.f_calls,
                                        state.g_calls,
                                        state.h_calls), state

To have full access to "state" after the optimization.

EDIT 2

Since this change will be introduced in the new version of the Optim.jl library, there is no need to continue this discussion. For now, the the extended_trace and the after_while! tricks work. Personally, I prefer the latter, so I will close the discussion giving @Dan Getz the correct answer.

like image 845
merch Avatar asked Apr 01 '17 14:04

merch


3 Answers

I know of one way, but whether it's worth it as opposed to estimating the inverse Hessian yourself I'm not sure. If you pass Optim.Options(store_trace=true, extended_trace=true), you can get a record of the optimization path which includes the last ~invH. For example, after

result = optimize(rosenbrock, zeros(2), BFGS(),
                  Optim.Options(store_trace=true, extended_trace=true));

we can get

julia> result.trace[end]
    16     5.471433e-17     2.333740e-09
 * Current step size: 1.0091356566200325
 * g(x): [2.33374e-9,-1.22984e-9]
 * ~inv(H): [0.498092 0.996422; 0.996422 1.9983]
 * x: [1.0,1.0]


julia> result.trace[end].metadata["~inv(H)"]
2×2 Array{Float64,2}:
 0.498092  0.996422
 0.996422  1.9983  

There are at least two things I don't like about this approach, though:

The first is that turning on extended_trace=true seems to force show_trace=true -- you'll notice I didn't show the output of the computation! Feels like a bug. This can be mitigated (though not removed) by setting show_every to something big, or avoided by redirecting the stdout entirely.

The second is that we should probably be able to get access to the last state without storing the whole path, but whether this is actually an issue will depend upon the size of your problem.

like image 116
DSM Avatar answered Nov 15 '22 09:11

DSM


Another less than optimal approach which works is hooking up to the internal Optim function after_while! which is currently doing nothing and using it to pull the the information from the last state.

In Julia code this looks like:

julia> using Optim

julia> rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock (generic function with 1 method)

julia> Optim.after_while!{T}(d, state::Optim.BFGSState{T}, method::BFGS, options)
  = global invH = state.invH

julia> result        = optimize(rosenbrock, zeros(2), BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926033423,0.9999999852005353]
 * Minimum: 5.471433e-17
 * Iterations: 16
 * Convergence: true
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
   * |g(x)| < 1.0e-08: true
   * f(x) > f(x'): false
   * Reached Maximum Number of Iterations: false
 * Objective Function Calls: 69
 * Gradient Calls: 69

julia> invH
2×2 Array{Float64,2}:
 0.498092  0.996422
 0.996422  1.9983  

This is unattractive for using a global variable and for depending on defining the after_while! before running/compiling optimize (but maybe in v0.6 this is already solved).

As @DSM noted in his answer, it is natural to want access to last state of optimizer. If tracing is not the answer maybe this is.

like image 5
Dan Getz Avatar answered Nov 15 '22 09:11

Dan Getz


The current least hacky way to do this in Optim.jl is to do the following.

First, load Optim and OptimTestProblems (to have an example to work off of)

julia> using Optim, OptimTestProblems

julia> prob = OptimTestProblems.UnconstrainedProblems.examples["Himmelblau"]
OptimTestProblems.MultivariateProblems.OptimizationProblem{Void,Void,Float64,String,Void}("Himmelblau", OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau, OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau_gradient!, nothing, OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau_hessian!, nothing, [2.0, 2.0], [3.0, 2.0], 0.0, true, true, nothing)

Then specify all the parts optimize needs and enter them in correct order:

julia> x0 = prob.initial_x
2-element Array{Float64,1}:
 2.0
 2.0

julia> obj = OnceDifferentiable(prob.f, prob.g!, x0)
NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1},Val{false}}(OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau, OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau_gradient!, NLSolversBase.fg!, 0.0, [NaN, NaN], [NaN, NaN], [NaN, NaN], [0], [0])

julia> m = BFGS()
Optim.BFGS{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64},Optim.##67#69}(LineSearches.InitialStatic{Float64}
  alpha: Float64 1.0
  scaled: Bool false
, LineSearches.HagerZhang{Float64}
  delta: Float64 0.1
  sigma: Float64 0.9
  alphamax: Float64 Inf
  rho: Float64 5.0
  epsilon: Float64 1.0e-6
  gamma: Float64 0.66
  linesearchmax: Int64 50
  psi3: Float64 0.1
  display: Int64 0
, Optim.#67, Optim.Flat())

julia> options = Optim.Options()
Optim.Options{Float64,Void}(1.0e-32, 1.0e-32, 1.0e-8, 0, 0, 0, false, 0, 1000, false, false, false, 1, nothing, NaN)

julia> bfgsstate = Optim.initial_state(m, options, obj, x0)
Optim.BFGSState{Array{Float64,1},Array{Float64,2},Float64,Array{Float64,1}}([2.0, 2.0], [6.91751e-310, 6.9175e-310], [-42.0, -18.0], NaN, [6.9175e-310, 0.0], [6.91751e-310, 0.0], [6.91751e-310, 0.0], [1.0 0.0; 0.0 1.0], [6.91751e-310, 0.0], NaN, [6.9175e-310, 6.91751e-310], 1.0, false, LineSearches.LineSearchResults{Float64}(Float64[], Float64[], Float64[], 0))

julia> res = optimize(obj, x0, m, options, bfgsstate)
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [2.0,2.0]
 * Minimizer: [2.9999999999998894,2.000000000000162]
 * Minimum: 5.406316e-25
 * Iterations: 7
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 5.81e-09 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 2.93e+09 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.95e-12 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 42
 * Gradient Calls: 42

Then we can access the the inverse Hessian from the state that was mutated in optimize.

julia> bfgsstate.invH
2×2 Array{Float64,2}:
  0.0160654   -0.00945561
 -0.00945561   0.034967  

And compare it to the inverse Hessian obtained by calculating the inverse of the actual Hessian.

julia> H=similar(bfgsstate.invH)
2×2 Array{Float64,2}:
 6.91751e-310  6.91751e-310
 6.91751e-310  6.91751e-310

julia> prob.h!(H, Optim.minimizer(res))
34.00000000000832

julia> H
2×2 Array{Float64,2}:
 74.0  20.0
 20.0  34.0

julia> inv(H)
2×2 Array{Float64,2}:
  0.0160681  -0.0094518
 -0.0094518   0.0349716

This is similar to the one obtained at the last step of the BFGS run.

like image 1
pkofod Avatar answered Nov 15 '22 09:11

pkofod