I came across the basin hopping algorithm in scipy and created a simple problem to understand how to use it but it doesnt seem to be working correctly for that problem. May be I'm doing something completely wrong.
Here is the code:
import scipy.optimize as spo
import numpy as np
minimizer_kwargs = {"method":"BFGS"}
f1=lambda x: (x-4)
def mybounds(**kwargs):
x = kwargs["x_new"]
tmax = bool(np.all(x <= 1.0))
tmin = bool(np.all(x >= 0.0))
print x
print tmin and tmax
return tmax and tmin
def print_fun(x, f, accepted):
print("at minima %.4f accepted %d" % (f, int(accepted)))
x0=[1.]
spo.basinhopping(f1,x0,accept_test=mybounds,callback=print_fun,niter=200,minimizer_kwargs=minimizer_kwargs)
The solution it gives is x: array([ -1.80746874e+08])
SciPy optimize provides functions for minimizing (or maximizing) objective functions, possibly subject to constraints. It includes solvers for nonlinear problems (with support for both local and global optimization algorithms), linear programing, constrained and nonlinear least-squares, root finding, and curve fitting.
Basin Hopping is a global optimization algorithm developed for use in the field of chemical physics. Basin-Hopping (BH) or Monte-Carlo Minimization (MCM) is so far the most reliable algorithms in chemical physics to search for the lowest-energy structure of atomic clusters and macromolecular systems.
Objective functions in scipy. optimize expect a numpy array as their first parameter which is to be optimized and must return a float value. The exact calling signature must be f(x, *args) where x represents a numpy array and args a tuple of additional arguments supplied to the objective function.
The function you are testing makes use of an approach called Metropolis-Hastings, which can be modified into a procedure called simulated annealing that can optimze functions in a stochastic way.
The way this works is as follows. First you pick a point, like your point x0
. From that point, you generate a random perturbation (this is called a "proposal"). Once there is a proposed perturbation, you get your candidate for a new point by applying the perturbation to your current out. So, you could think of it like x1 = x0 + perturbation
.
In regular old gradient descent, the perturbation
term is just a deterministically calculated quantity, like a step in the direction of the gradient. But in Metropolis-Hastings, perturbation
is generated randomly (sometimes by using the gradient as a clue about where to randomly go... but sometimes just randomly with no clues).
At this point when you've got x1
, you have to ask yourself: "Did I do a good thing by randomly perturbing x0
or did I just mess everything up?" One part of that has to do with sticking inside of some bounds, such as your mybounds
function. Another part of it has to do with how much better/worse the objective function's value has become at the new point.
So there are two ways to reject the proposal of x1
: first, it could violate the bounds you have set and be an infeasible point by the problem's definition; second, from the accept/reject assessment step of Metropolis-Hastings, it could just be a very bad point that should be rejected. In either case, you will reject x1
and instead set x1 = x0
and pretend as if you just stayed in the same spot to try again.
Contrast that with a gradient-type method, where you'll definitely, no matter what, always make at least some kind of movement (a step in the direction of the gradient).
Whew, all right. With that all aside, let's think about how this plays out with the basinhopping
function. From the documentation we can see that the typical acceptance condition is accessed by the take_step
argument, and the documentation saying this: "The default step taking routine is a random displacement of the coordinates, but other step taking algorithms may be better for some systems." So, even apart from your mybounds
bounds checker, the function will be making a random displacement of the coordinates to generate its new point to try. And since the gradient of this function is just the constant 1
, it will always be making the same big steps in the direction of negative gradient (for minimizing).
At a practical level, this means that the proposed points for x1
are always going to be quite outside of the interval [0,1]
and your bounds checker will always veto them.
When I run your code, I see this happening all the time:
In [5]: spo.basinhopping(f1,x0,accept_test=mybounds,callback=print_fun,niter=200,minimizer_kwargs=minimizer_kwargs)
at minima -180750994.1924 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.5530 accepted 0
[ -1.80746873e+08]
False
at minima -180746877.3896 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.7281 accepted 0
[ -1.80746874e+08]
False
at minima -180746878.2433 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.5774 accepted 0
[ -1.80746874e+08]
False
at minima -180746878.3173 accepted 0
[ -1.80750990e+08]
False
at minima -180750994.3509 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.6605 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.6966 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.6900 accepted 0
[ -1.80750990e+08]
False
at minima -180750993.9707 accepted 0
[ -1.80750990e+08]
False
at minima -180750994.0494 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.5824 accepted 0
[ -1.80746874e+08]
False
at minima -180746877.5459 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.6679 accepted 0
[ -1.80750991e+08]
False
at minima -180750994.5823 accepted 0
[ -1.80750990e+08]
False
at minima -180750993.9308 accepted 0
[ -1.80746874e+08]
False
at minima -180746878.0395 accepted 0
[ -1.80750991e+08]
False
# ... etc.
So it's never accepting the posposal points. The output is not telling you that it has found a solution. It's telling you than the random perturbing to explore possible solutions keeps resulting in points that look better and better to the optimizer, but which keep failing to satisfy your criteria. It can't find its way all the way back to [0,1]
to get points that do satisfy mybounds
.
The behavior of basin-hopping as you've coded it is to combine perturbations with local minimization.
Your routine keeps producing unacceptable because of the local optimization piece. Essentially the BFGS routine you're using is completely unconstrained, so it follows the gradient out to negative infinity. This result then gets fed back into your checker.
So it doesn't matter where your Basin-Hopping point x1
is, the BFGS part always goes to a massive negative value.
The benchmark function x - 4
that you're using is not the ideal target here. Check out e.g. the Rastrigin function. If you actually need to optimize a linear function, there's a whole class of algorithms to do that (see Linear Programming on Wikipedia).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With