I have to use the curve_fit numpy function over a large set of data (5 000 000). So basically I've created a 2D array. First dimension is the number of fittings to perform, second dimension is the number of points used for the fitting.
t = np.array([0 1 2 3 4])
for d in np.ndindex(data.shape[0]):
try:
popt, pcov = curve_fit(func, t, np.squeeze(data[d,:]), p0=[1000,100])
except RuntimeError:
print("Error - curve_fit failed")
multiprocessing can be used to speed up the full process, but it is still quite slow. Is there a way to use curve_fit in a "vectorized" manner?
Curve fit extends the functionality of scipy.optimize.leastsq
which is itself a wrapper for the underlying MINPACK lmdif
and lmder
fortran routines. It looks like multi-threading is not possible, check out this link, which says,
The underlying Fortran 77 routines (MINPACK lmder.f and lmdif.f) are not reentrant, so the GIL cannot be released. (Thus no chance of parallel processing with threads.)
There is still an open ticket to develop this but it looks like it can not be finished... You would either need to use a different library or write a wrapper/function in a lower level code. There are papers on implementations of parallel Levenberg-Marquardt algorithms.
Maybe there is another solution, use less data or as a rough estimate, you could randomly split your data into parts, curve fit each of the parts on a separate thread (with multi-processor) and take an average of the coefficients at the end.
With my experience, you should supply jacobian to curve_fit, if possible. It will save time by avoiding calling func
again and again to calculate the jacobian. It would give you a significant speed boost, especially if you are dealing with a large number of optimizable parameters.
One way to speed it up is by adding in some prior knowledge to curve_fit.
If you know the range you expect your parameters to be, and if you don't need precision up to the 100th significant number, you can speed up your computations massively.
Here an example, in which you'd be fitting for param1
and param2
:
t = np.array([0 1 2 3 4])
def func(t, param1, param2):
return param1*t + param2*np.exp(t)
for d in np.ndindex(data.shape[0]):
try:
popt, pcov = curve_fit(func, t, np.squeeze(data[d,:]), p0=[1000,100],
bounds=([min_param1, min_param2],[max_param1, max_param2]),
ftol=0.5, xtol=0.5)
except RuntimeError:
print("Error - curve_fit failed")
Note the extra key arguments bounds
, ftol
and xtol
. You can read about them here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With