Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel many dimensional optimization

I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.

def value(param):
     run_program(param)
     #Parse output
     return value

scipy.optimize.fmin_powell(value,param) 

This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:

Initial guess: param=[1,2,3,4,5]

#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
    new_param1=param[:]
    new_param1[num]=new_param1[num]+jump[num]
    run_program(new_param1)
    new_param2=param[:]
    new_param2[num]=new_param2[num]-jump[num]
    run_program(new_param2)

#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat

Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?

So the primary question is the following: Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?

Thank you for your time.

As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.

Current best implementation is parallelizing the inner loop of powell's method.

Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.

like image 503
Daniel Avatar asked Dec 04 '12 15:12

Daniel


3 Answers

I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.

The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.

Side note: If you don't have a cluster and needs more computing power HTCondor may help you.

like image 162
Mac Avatar answered Nov 02 '22 04:11

Mac


Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.

A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.

like image 1
Alex I Avatar answered Nov 02 '22 03:11

Alex I


There are two ways of estimating gradients, one easily parallelizable, one not:

  • around a single point, e.g. (f( x + h directioni ) - f(x)) / h; this is easily parallelizable up to Ndim
  • "walking" gradient: walk from x0 in direction e0 to x1, then from x1 in direction e1 to x2 ...; this is sequential.

Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions). The user-supplied gradient function can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method, see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?

I'd suggest scipy fmin_tnc with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw, this compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)

like image 1
denis Avatar answered Nov 02 '22 04:11

denis