I'm trying to fit a histogram with some data in it using scipy.optimize.curve_fit
. If I want to add an error in y
, I can simply do so by applying a weight
to the fit. But how to apply the error in x
(i. e. the error due to binning in case of histograms)?
My question also applies to errors in x
when making a linear regression with curve_fit
or polyfit
; I know how to add errors in y
, but not in x
.
Here an example (partly from the matplotlib documentation):
import numpy as np
import pylab as P
from scipy.optimize import curve_fit
# create the data histogram
mu, sigma = 200, 25
x = mu + sigma*P.randn(10000)
# define fit function
def gauss(x, *p):
A, mu, sigma = p
return A*np.exp(-(x-mu)**2/(2*sigma**2))
# the histogram of the data
n, bins, patches = P.hist(x, 50, histtype='step')
sigma_n = np.sqrt(n) # Adding Poisson errors in y
bin_centres = (bins[:-1] + bins[1:])/2
sigma_x = (bins[1] - bins[0])/np.sqrt(12) # Binning error in x
P.setp(patches, 'facecolor', 'g', 'alpha', 0.75)
# fitting and plotting
p0 = [700, 200, 25]
popt, pcov = curve_fit(gauss, bin_centres, n, p0=p0, sigma=sigma_n, absolute_sigma=True)
x = np.arange(100, 300, 0.5)
fit = gauss(x, *popt)
P.plot(x, fit, 'r--')
Now, this fit (when it doesn't fail) does consider the y-errors sigma_n
, but I haven't found a way to make it consider sigma_x
. I scanned a couple of threads on the scipy mailing list and found out how to use the absolute_sigma
value and a post on Stackoverflow about asymmetrical errors, but nothing about errors in both directions. Is it possible to achieve?
The SciPy open source library provides the curve_fit() function for curve fitting via nonlinear least squares. The function takes the same input and output data as arguments, as well as the name of the mapping function to use. The mapping function must take examples of input data and some number of arguments.
The curve_fit() function returns an optimal parameters and estimated covariance values as an output. Now, we'll start fitting the data by setting the target function, and x, y data into the curve_fit() function and get the output data which contains a, b, and c parameter values.
scipy.optmize.curve_fit
uses standard non-linear least squares optimization and therefore only minimizes the deviation in the response variables. If you want to have an error in the independent variable to be considered you can try scipy.odr
which uses orthogonal distance regression. As its name suggests it minimizes in both independent and dependent variables.
Have a look at the sample below. The fit_type
parameter determines whether scipy.odr
does full ODR (fit_type=0
) or least squares optimization (fit_type=2
).
EDIT
Although the example worked it did not make much sense, since the y data was calculated on the noisy x data, which just resulted in an unequally spaced indepenent variable. I updated the sample which now also shows how to use RealData
which allows for specifying the standard error of the data instead of the weights.
from scipy.odr import ODR, Model, Data, RealData
import numpy as np
from pylab import *
def func(beta, x):
y = beta[0]+beta[1]*x+beta[2]*x**3
return y
#generate data
x = np.linspace(-3,2,100)
y = func([-2.3,7.0,-4.0], x)
# add some noise
x += np.random.normal(scale=0.3, size=100)
y += np.random.normal(scale=0.1, size=100)
data = RealData(x, y, 0.3, 0.1)
model = Model(func)
odr = ODR(data, model, [1,0,0])
odr.set_job(fit_type=2)
output = odr.run()
xn = np.linspace(-3,2,50)
yn = func(output.beta, xn)
hold(True)
plot(x,y,'ro')
plot(xn,yn,'k-',label='leastsq')
odr.set_job(fit_type=0)
output = odr.run()
yn = func(output.beta, xn)
plot(xn,yn,'g-',label='odr')
legend(loc=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With