Scipy.optimize.curve_fit won't fit cosine power law

Tags:

For several hours now, I have been trying to fit a model to a (generated) dataset as a casus for a problem I've been struggling with. I generated datapoints for the function f(x) = A*cos^n(x)+b, and added some noise. When I try to fit the dataset with this function and curve_fit, I get the error

./tester.py:10: RuntimeWarning: invalid value encountered in power
return Amp*(np.cos(x))**n + b
/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py:690: OptimizeWarning: Covariance of the parameters could not be estimated  category=OptimizeWarning)

The code I'm using to generate the datapoints and fit the model is the following:

#!/usr/bin/env python

from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure, show, rc, plot

def f(x, Amp, n, b):

    return np.real(Amp*(np.cos(x))**n + b)

x = np.arange(0, 6.28, 0.01)
randomPart = np.random.rand(len(x))-0.5
fig = figure()
sample = f(x, 5, 2, 5)+randomPart
frame = fig.add_subplot(1,1,1)

frame.plot(x, sample, label="Sample measurements")

popt, pcov = curve_fit(f, x, sample, p0=(1,1,1))

modeldata = f(x, popt[0], popt[1], popt[2])
print(modeldata)
frame.plot(x, modeldata, label="Best fit")

frame.legend()
frame.set_xlabel("x")
frame.set_ylabel("y")

show()

The noisy data is shown - see the image below.

No fit is possible

Does any of you have a clue of what's going on? I suspect it has something to do with the power law going into the complex domain, as the real part of the function is nowhere divergent. I have tried returning only the real part of the function, setting realistic bounds in curve_fit and using a numpy array instead of a python list for p0 already as well. I'm running the latest version of scipy available to me, scipy 0.17.0-1.

860

asked May 27 '17 22:05

Leander

1 Answers

The problem is the following:

>>> (-2)**1.1
(-2.0386342710747223-0.6623924280875919j)
>>> np.array(-2)**1.1
__main__:1: RuntimeWarning: invalid value encountered in power
nan

Unlike native python floats, numpy doubles usually refuse to take part in operations leading to complex results:

>>> np.sqrt(-1)
__main__:1: RuntimeWarning: invalid value encountered in sqrt
nan

As a quick workaround I suggest adding an np.abs call to your function, and using appropriate bounds for fitting to make sure this doesn't give a spurious fit. If your model is near the truth and your sample (I mean the cosine in your sample) is positive, then adding an absolute value around it should be a no-op (update: I realize this is never the case, see the proper approach below).

def f(x, Amp, n, b):

    return Amp*(np.abs(np.cos(x)))**n + b # only change here

With this small change I get this:

result is fine

For reference, the parameters from the fit are (4.96482314, 2.03690954, 5.03709923]) comparing to the generation with (5,2,5).

After giving it a bit more thought I realized that the cosine will always be negative for half your domain (duh). So the workaround I suggested might be a bit problematic, or at least its correctness is non-trivial. On the other hand, thinking of your original formula containing cos(x)^n, with negative values for cos(x) this only makes sense as a model if n is an integer, otherwise you get a complex result. Since we can't solve Diophantine fitting problems, we need to handle this properly.

The most proper way (by which I mean the way that is least likely to bias your data) is this: first do the fitting with a model that converts your data to complex numbers then takes the complex magnitude on output:

def f(x, Amp, n, b):

    return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b

This is obviously much less efficient than my workaround, since in each fitting step we create a new mesh, and do some extra work both in the form of complex arithmetic and an extra magnitude calculation. This gives me the following fit even with no bounds set:

improved answer, first part

The parameters are (5.02849409, 1.97655728, 4.96529108). These are close too. However, if we put these values back into the actual model (without np.abs), we get imaginary parts as large as -0.37, which is not overwhelming but significant.

So the second step should be redoing the fit with a proper model---one that has an integer exponent. Take the exponent 2 which is obvious from your fit, and do a new fit with this model. I don't believe any other approach gives you a mathematically sound result. You can also start from the original popt, hoping that it's indeed close to the truth. Of course we could use the original function with some currying, but it's much faster to use a dedicated double-specific version of your model.

from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import subplots, show

def f_aux(x, Amp, n, b):
    return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b

def f_real(x, Amp, n, b):
    return Amp*np.cos(x)**n + b


x = np.arange(0, 2*np.pi, 0.01)  # pi
randomPart = np.random.rand(len(x)) - 0.5
sample = f(x, 5, 2, 5) + randomPart

fig,(frame_aux,frame) = subplots(ncols=2)
for fr in frame_aux,frame:
    fr.plot(x, sample, label="Sample measurements")
    fr.legend()
    fr.set_xlabel("x")
    fr.set_ylabel("y")

# auxiliary fit for n value
popt_aux, pcov_aux = curve_fit(f_aux, x, sample, p0=(1,1,1))

modeldata = f(x, *popt_aux)
#print(modeldata)
print('Auxiliary fit parameters: {}'.format(popt_aux))
frame_aux.plot(x, modeldata, label="Auxiliary fit")

# check visually, test if it's close to an integer, but otherwise
n = np.round(popt_aux[1])

# actual fit with integral exponent
popt, pcov = curve_fit(lambda x,Amp,b,n=n: f_real(x,Amp,n,b), x, sample, p0=(popt_aux[0],popt_aux[2]))

modeldata = f(x, popt[0], n, popt[1])
#print(modeldata)
print('Final fit parameters: {}'.format([popt[0],n,popt[1]])) 
frame.plot(x, modeldata, label="Best fit")

frame_aux.legend()
frame.legend()

show()

Note that I changed a few things in your code which doesn't really affect my point. The figure from the above, so the one that shows both the auxiliary fit and the proper one:

final fig

The output:

Auxiliary fit parameters: [ 5.02628994  2.00886409  5.00652371]
Final fit parameters: [5.0288141074549699, 2.0, 5.0009730316739462]

Just to reiterate: while there might be no visual difference between the auxiliary fit and the proper one, only the latter gives a meaningful answer to your problem.

answered Nov 11 '22 09:11

Andras Deak -- Слава Україні

Related questions
                            
                                What is best practice to create a small database in python?
                            
                                Python data frame Export to csv with Quotation marks (")
                            
                                Rendering a 3D mesh into an image using python
                            
                                Converting svg from Highcharts data into data points
                            
                                Adding gravity to a bouncing ball using vectors
                            
                                Multiply all rows in a Pandas DataFrame by dictionary
                            
                                How to run python-socketio in Thread?
                            
                                Pandas - merging dataframes conditionally on multiple columns
                            
                                Other option for colored scrollbar in tkinter based program?
                            
                                Using collections.Counter to count emojis with different colors
                            
                                PySpark sampleBy using multiple columns
                            
                                How do I just keep the rows with the maximum value in a column for items of the same type? [duplicate]
                            
                                python Get the unique values from a dictionary
                            
                                Create a dataframe from a dict where values are variable-length lists
                            
                                Are there more than three types of methods in Python?
                            
                                How to define default argument value based on previous arguments?
                            
                                Pandas reading NULL as a NaN float instead of str [duplicate]
                            
                                How can I upload a 'file' to S3 by creating a temp file, using AWS Lambda?
                            
                                How to get all combination from multiple lists?
                            
                                __call__ method of type class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scipy.optimize.curve_fit won't fit cosine power law

Tags:

python

numpy

scipy

curve-fitting

Leander

People also ask

1 Answers

Andras Deak -- Слава Україні

Recent Activity

Donate For Us