I'm trying to fit data that resembles a cosine function with a decaying Gaussian envelope using python, and in particular scipy.optimize.curve_fit
. However, it seems like my code is not varying the parameters enough.
For example, in this data, the center of the interferogram is slightly negative (around -0.1), but even when I provide a guess that is very close, the optimized parameters are clearly off. The horizontal offset just stays very close to my initial guess even if it is clearly not an optimal point. I checked that by manually varying the horizontal offset and calculating the coefficient of determination and the values offered by curve_fit
are not the most optimal.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
id_ = '0'
filename = '250516_FF_01_coincidences_'+id_+'mm.txt'
def FF_function(x,l,s,A,xoff):
d = x - xoff
return A*np.cos(2*np.pi/l*d)*np.exp(-0.5*(d/s)**2)+A #
data = pd.read_csv(filename, sep=' ',header= None).to_numpy()
# data = data.astype(dtype=np.float32)
fig = plt.subplots(figsize=(10,5))
ax = plt.subplot(111)
ax.plot(data[:,0],data[:,1],alpha = 0.3)
guess = [4.40673558e-02, 2.63973377e-01, 1.63702697e+02, -0.2]
xx=np.linspace(min(data[:,0]),max(data[:,0]),1000)
# ax.plot(xx,FF_function(xx,*guess)
popt,cov = curve_fit(FF_function,data[:,0],data[:,1],p0 = guess)
ax.plot(xx,FF_function(xx,*popt), color ='C0',alpha = 0.5)
SSE = np.sum(np.square(data[:,1]-FF_function(data[:,0], *popt)))
SST = np.sum(np.square(data[:,1]-np.mean(data[:,1])))
r2 = 1 - SSE/SST
print(r2)
print(popt)
#
ax.set_xlabel('Displacement (mm)')
ax.set_ylabel('Coincidences')
I can try doing a manual minimization of the determination coefficient, but I'm curious about what is going wrong in my fit.
Lets confirm the model can be fitted.
We import libraries and your dataset:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import optimize
data = pd.read_csv("250516_FF_01_coincidences_0mm.txt", sep=" ", header=None, names=["x", "y"])
We define the model:
def model(x, A, omega, sigma, x0):
return A * (np.sin(2 * np.pi * omega * (x - x0)) * np.exp(-0.5 * np.power((x - x0) / sigma, 2.)) + 1.)
We apply curve fit with some raw guess:
popt, pcov = optimize.curve_fit(model, data['x'], data['y'], p0=[200, 30, 1, 0])
# [ 1.63791101e+02, 2.26881496e+01, 2.62294794e-01, -1.01404332e-02]
We can regress using the parameters:
yhat = model(data['x'], *popt)
Regression looks like:
Luckily the pulsation omega
of the curve is found without any trouble. That's a good point because curve_fit
is not good at fitting period function such sin
.
But your problem mostly reside in the value of x0
which is a very sensitive parameters in this fit because it shapes multiple minima error landscape. Lets investigate how the Loss function varies wrt x0
while other parameters are kept constant:
def loss_factory(x, y, p):
@np.vectorize
def wrapped(x0):
return 0.5 * np.sum(np.power((y - model(x, *p, x0)), 2.))
return wrapped
loss = loss_factory(data['x'], data['y'], popt[:-1])
x0lin = np.linspace(-0.25, 0.25, 500)
L = loss(x0lin)
The result looks like:
You can see that there is multiple local minima for the Loss function that gives different offset (origin shift) to your regressed parameters depending on the initial guess. This is why you will need to refine the initial guess by hand or choose a global minimum optimizer.
As suggested by @Reinderien
, we may decouple x0
into two parameters phase offset phi
and envelope mean mu
. Such a model is defined as follow:
def model2(x, A, omega, sigma, phi, mu):
return A * (np.sin(2 * np.pi * omega * x + phi) * np.exp(-0.5 * np.power((x - mu) / sigma, 2.)) + 1.)
Regression performs well provided a viable raw guess:
popt2, pcov2 = optimize.curve_fit(model2, data['x'], data['y'], p0=[160, 22, 0.1, 0, 0])
# array([ 1.63807815e+02, 2.26888064e+01, 2.62230722e-01, 1.44459161e+00, -3.42999984e-02])
yhat2 = model2(data['x'], *popt2)
The error landscape for the position parameters behaves way better:
def loss_factory_2(x, y, p):
@np.vectorize
def wrapped(phi, mu):
return 0.5 * np.sum(np.power((y - model2(x, *p, phi, mu)), 2.))
return wrapped
philin = np.linspace(-2 * np.pi, 2 * np.pi, 200)
mulin = np.linspace(-2, 2, 200)
Phi, Mu = np.meshgrid(philin, mulin)
loss2 = loss_factory_2(data['x'], data['y'], popt2[:-2])
L2 = loss2(Phi, Mu)
Parameter phi
indeed exhibit a 2 pi
periodicity, while mu
does not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With