Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scipy.optimize.curve_fit won't fit cosine power law

For several hours now, I have been trying to fit a model to a (generated) dataset as a casus for a problem I've been struggling with. I generated datapoints for the function f(x) = A*cos^n(x)+b, and added some noise. When I try to fit the dataset with this function and curve_fit, I get the error

./tester.py:10: RuntimeWarning: invalid value encountered in power
return Amp*(np.cos(x))**n + b
/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py:690: OptimizeWarning: Covariance of the parameters could not be estimated  category=OptimizeWarning)

The code I'm using to generate the datapoints and fit the model is the following:

#!/usr/bin/env python

from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure, show, rc, plot

def f(x, Amp, n, b):

    return np.real(Amp*(np.cos(x))**n + b)

x = np.arange(0, 6.28, 0.01)
randomPart = np.random.rand(len(x))-0.5
fig = figure()
sample = f(x, 5, 2, 5)+randomPart
frame = fig.add_subplot(1,1,1)

frame.plot(x, sample, label="Sample measurements")

popt, pcov = curve_fit(f, x, sample, p0=(1,1,1))

modeldata = f(x, popt[0], popt[1], popt[2])
print(modeldata)
frame.plot(x, modeldata, label="Best fit")

frame.legend()
frame.set_xlabel("x")
frame.set_ylabel("y")

show()

The noisy data is shown - see the image below.

No fit is possible

Does any of you have a clue of what's going on? I suspect it has something to do with the power law going into the complex domain, as the real part of the function is nowhere divergent. I have tried returning only the real part of the function, setting realistic bounds in curve_fit and using a numpy array instead of a python list for p0 already as well. I'm running the latest version of scipy available to me, scipy 0.17.0-1.

like image 860
Leander Avatar asked May 27 '17 22:05

Leander


People also ask

What does SciPy optimize curve_fit do?

optimize. curve_fit() function is used to find the best-fit parameters using a least-squares fit. The curve_fit method fits our model to the data. The curve fit is essential to find the optimal set of parameters for the defined function that best fits the provided set of observations.

What is curve_fit SciPy?

The SciPy API provides a 'curve_fit' function in its optimization library to fit the data with a given function. This method applies non-linear least squares to fit the data and extract the optimal parameters out of it.

What is POPT and PCOV?

1. What does popt and pcov mean? popt- An array of optimal values for the parameters which minimizes the sum of squares of residuals. pcov-2d array which contains the estimated covariance of popt. The diagonals provide the variance of the parameter estimate.


1 Answers

The problem is the following:

>>> (-2)**1.1
(-2.0386342710747223-0.6623924280875919j)
>>> np.array(-2)**1.1
__main__:1: RuntimeWarning: invalid value encountered in power
nan

Unlike native python floats, numpy doubles usually refuse to take part in operations leading to complex results:

>>> np.sqrt(-1)
__main__:1: RuntimeWarning: invalid value encountered in sqrt
nan

As a quick workaround I suggest adding an np.abs call to your function, and using appropriate bounds for fitting to make sure this doesn't give a spurious fit. If your model is near the truth and your sample (I mean the cosine in your sample) is positive, then adding an absolute value around it should be a no-op (update: I realize this is never the case, see the proper approach below).

def f(x, Amp, n, b):

    return Amp*(np.abs(np.cos(x)))**n + b # only change here

With this small change I get this:

result is fine

For reference, the parameters from the fit are (4.96482314, 2.03690954, 5.03709923]) comparing to the generation with (5,2,5).


After giving it a bit more thought I realized that the cosine will always be negative for half your domain (duh). So the workaround I suggested might be a bit problematic, or at least its correctness is non-trivial. On the other hand, thinking of your original formula containing cos(x)^n, with negative values for cos(x) this only makes sense as a model if n is an integer, otherwise you get a complex result. Since we can't solve Diophantine fitting problems, we need to handle this properly.

The most proper way (by which I mean the way that is least likely to bias your data) is this: first do the fitting with a model that converts your data to complex numbers then takes the complex magnitude on output:

def f(x, Amp, n, b):

    return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b

This is obviously much less efficient than my workaround, since in each fitting step we create a new mesh, and do some extra work both in the form of complex arithmetic and an extra magnitude calculation. This gives me the following fit even with no bounds set:

improved answer, first part

The parameters are (5.02849409, 1.97655728, 4.96529108). These are close too. However, if we put these values back into the actual model (without np.abs), we get imaginary parts as large as -0.37, which is not overwhelming but significant.

So the second step should be redoing the fit with a proper model---one that has an integer exponent. Take the exponent 2 which is obvious from your fit, and do a new fit with this model. I don't believe any other approach gives you a mathematically sound result. You can also start from the original popt, hoping that it's indeed close to the truth. Of course we could use the original function with some currying, but it's much faster to use a dedicated double-specific version of your model.

from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import subplots, show

def f_aux(x, Amp, n, b):
    return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b

def f_real(x, Amp, n, b):
    return Amp*np.cos(x)**n + b


x = np.arange(0, 2*np.pi, 0.01)  # pi
randomPart = np.random.rand(len(x)) - 0.5
sample = f(x, 5, 2, 5) + randomPart

fig,(frame_aux,frame) = subplots(ncols=2)
for fr in frame_aux,frame:
    fr.plot(x, sample, label="Sample measurements")
    fr.legend()
    fr.set_xlabel("x")
    fr.set_ylabel("y")

# auxiliary fit for n value
popt_aux, pcov_aux = curve_fit(f_aux, x, sample, p0=(1,1,1))

modeldata = f(x, *popt_aux)
#print(modeldata)
print('Auxiliary fit parameters: {}'.format(popt_aux))
frame_aux.plot(x, modeldata, label="Auxiliary fit")

# check visually, test if it's close to an integer, but otherwise
n = np.round(popt_aux[1])

# actual fit with integral exponent
popt, pcov = curve_fit(lambda x,Amp,b,n=n: f_real(x,Amp,n,b), x, sample, p0=(popt_aux[0],popt_aux[2]))

modeldata = f(x, popt[0], n, popt[1])
#print(modeldata)
print('Final fit parameters: {}'.format([popt[0],n,popt[1]])) 
frame.plot(x, modeldata, label="Best fit")

frame_aux.legend()
frame.legend()

show()

Note that I changed a few things in your code which doesn't really affect my point. The figure from the above, so the one that shows both the auxiliary fit and the proper one:

final fig

The output:

Auxiliary fit parameters: [ 5.02628994  2.00886409  5.00652371]
Final fit parameters: [5.0288141074549699, 2.0, 5.0009730316739462]

Just to reiterate: while there might be no visual difference between the auxiliary fit and the proper one, only the latter gives a meaningful answer to your problem.

like image 85