Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up python curve_fit over a 2D array?

I have to use the curve_fit numpy function over a large set of data (5 000 000). So basically I've created a 2D array. First dimension is the number of fittings to perform, second dimension is the number of points used for the fitting.

t = np.array([0 1 2 3 4])

for d in np.ndindex(data.shape[0]):
  try:
    popt, pcov = curve_fit(func, t, np.squeeze(data[d,:]), p0=[1000,100])
  except RuntimeError:
    print("Error - curve_fit failed")

multiprocessing can be used to speed up the full process, but it is still quite slow. Is there a way to use curve_fit in a "vectorized" manner?

like image 557
francois rousseau Avatar asked Jun 26 '15 10:06

francois rousseau


3 Answers

Curve fit extends the functionality of scipy.optimize.leastsq which is itself a wrapper for the underlying MINPACK lmdif and lmder fortran routines. It looks like multi-threading is not possible, check out this link, which says,

The underlying Fortran 77 routines (MINPACK lmder.f and lmdif.f) are not reentrant, so the GIL cannot be released. (Thus no chance of parallel processing with threads.)

There is still an open ticket to develop this but it looks like it can not be finished... You would either need to use a different library or write a wrapper/function in a lower level code. There are papers on implementations of parallel Levenberg-Marquardt algorithms.

Maybe there is another solution, use less data or as a rough estimate, you could randomly split your data into parts, curve fit each of the parts on a separate thread (with multi-processor) and take an average of the coefficients at the end.

like image 26
Ed Smith Avatar answered Nov 19 '22 09:11

Ed Smith


With my experience, you should supply jacobian to curve_fit, if possible. It will save time by avoiding calling func again and again to calculate the jacobian. It would give you a significant speed boost, especially if you are dealing with a large number of optimizable parameters.

like image 23
Bhanuday Sharma Avatar answered Nov 19 '22 08:11

Bhanuday Sharma


One way to speed it up is by adding in some prior knowledge to curve_fit.

If you know the range you expect your parameters to be, and if you don't need precision up to the 100th significant number, you can speed up your computations massively.

Here an example, in which you'd be fitting for param1 and param2:

t = np.array([0 1 2 3 4])
def func(t, param1, param2):
  return param1*t + param2*np.exp(t)

for d in np.ndindex(data.shape[0]):
  try:
    popt, pcov = curve_fit(func, t, np.squeeze(data[d,:]), p0=[1000,100], 
                           bounds=([min_param1, min_param2],[max_param1, max_param2]),
                           ftol=0.5, xtol=0.5)
  except RuntimeError:
    print("Error - curve_fit failed")

Note the extra key arguments bounds, ftol and xtol. You can read about them here.

like image 178
dangom Avatar answered Nov 19 '22 09:11

dangom