Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to get the error in fitting parameters from scipy.stats.norm.fit?

I have some data which I have fitted a normal distribution to using the scipy.stats.normal objects fit function like so:

import numpy as np                                                                                                                                                                                                                       
import matplotlib.pyplot as plt                                                                                                                                                                                                          
from scipy.stats import norm                                                                                                                                                                                                             
import matplotlib.mlab as mlab                                                                                                                                                                                                           

x = np.random.normal(size=50000)                                                                                                                                                                                                         

fig, ax = plt.subplots()                                                                                                                                                                                                                 

nbins = 75                                                                                                                                                                                                                               
mu, sigma = norm.fit(x)                                                                                                                                                                                                                  
n, bins, patches = ax.hist(x,nbins,normed=1,facecolor = 'grey', alpha = 0.5, label='before');                                                                                                                                            
y0 = mlab.normpdf(bins, mu, sigma) # Line of best fit                                                                                                                                                                                    
ax.plot(bins,y0,'k--',linewidth = 2, label='fit before')                                                                                                                                                                                 
ax.set_title('$\mu$={}, $\sigma$={}'.format(mu, sigma))                                                                                                                                                                                  

plt.show()                                                                                                                                                                                                                               

I would now like to extract the uncertainty/error in the fitted mu and sigma values. How can I go about this?

like image 802
SomeRandomPhysicist Avatar asked Mar 06 '18 12:03

SomeRandomPhysicist


People also ask

What does Scipy fit return?

Return estimates of shape (if applicable), location, and scale parameters from data. The default estimation method is Maximum Likelihood Estimation (MLE), but Method of Moments (MM) is also available.

What is loc parameter in Scipy?

loc is short for "location parameter", and scale is naturally any scale parameters. Location parameters would include the mean in the normal distribution and the median in the Cauchy distribution. Scale parameters are like the standard deviation in the normal distribution, or either parameter of the gamma distribution.

What is PPF in Scipy?

ppf() takes a percentage and returns a standard deviation multiplier for what value that percentage occurs at. It is equivalent to a, 'One-tail test' on the density plot. From scipy. stats.


2 Answers

You can use scipy.optimize.curve_fit: This method does not only return the estimated optimal values of the parameters, but also the corresponding covariance matrix:

popt : array

Optimal values for the parameters so that the sum of the squared residuals of f(xdata, *popt) - ydata is minimized

pcov : 2d array

The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).

How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as described above.

If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix.

You can calculate the standard deviation errors of the parameters from the square roots of the diagonal elements of the covariance matrix as follows:

import numpy as np 
import matplotlib.pyplot as plt
from scipy.stats import norm 
from scipy.optimize import curve_fit

x = np.random.normal(size=50000)
fig, ax = plt.subplots() 
nbins = 75
n, bins, patches = ax.hist(x,nbins, density=True, facecolor = 'grey', alpha = 0.5, label='before'); 

centers = (0.5*(bins[1:]+bins[:-1]))
pars, cov = curve_fit(lambda x, mu, sig : norm.pdf(x, loc=mu, scale=sig), centers, n, p0=[0,1])

ax.plot(centers, norm.pdf(centers,*pars), 'k--',linewidth = 2, label='fit before') 
ax.set_title('$\mu={:.4f}\pm{:.4f}$, $\sigma={:.4f}\pm{:.4f}$'.format(pars[0],np.sqrt(cov[0,0]), pars[1], np.sqrt(cov[1,1 ])))

plt.show()

This results in the following plot:

enter image description here

like image 87
jdamp Avatar answered Oct 19 '22 19:10

jdamp


See also lmfit (https://github.com/lmfit/lmfit-py) which gives an easier interface and reports uncertainties in fitted variables. To fit data to a normal distribution, see http://lmfit.github.io/lmfit-py/builtin_models.html#example-1-fit-peak-data-to-gaussian-lorentzian-and-voigt-profiles

and use something like

from lmfit.models import GaussianModel

model = GaussianModel()

# create parameters with initial guesses:
params = model.make_params(center=9, amplitude=40, sigma=1)  

result = model.fit(ydata, params, x=xdata)
print(result.fit_report())

The report will include the 1-sigma errors like

[[Variables]]
    sigma:       1.23218358 +/- 0.007374 (0.60%) (init= 1.0)
    center:      9.24277047 +/- 0.007374 (0.08%) (init= 9.0)
    amplitude:   30.3135620 +/- 0.157126 (0.52%) (init= 40.0)
    fwhm:        2.90157055 +/- 0.017366 (0.60%)  == '2.3548200*sigma'
    height:      9.81457817 +/- 0.050872 (0.52%)  == '0.3989423*amplitude/max(1.e-15, sigma)'
like image 25
M Newville Avatar answered Oct 19 '22 21:10

M Newville