I have some which I am fitting to the gamma distribution using scipy.stats. I am able to extract the shape, loc, and scale params and they look reasonable with the data ranges I expect.
My question is: is there a way to also get the errors in the parameters? something like the output of curve_fit. NOTE: I don't use curve fit directly because it is not working properly and most of the time is not able to compute the parameters of the gamma distribution. On the other hand, scipy.stats.gamma.fit works ok.
This is an example (no my actual data) of what I am doing.
from scipy.stats import gamma
shape = 12; loc = 0.71; scale = 0.0166
data = gamma.rvs(shape, loc=loc, scale=scale, size=1000)
params = gamma.fit(data) # params close to but not the same as (shape, loc, scale)
# HOW TO ESTIMATE/GET ERRORS FOR EACH PARAM?
Thanks in advance
To fit the gamma distribution to data and find parameter estimates, use gamfit , fitdist , or mle . Unlike gamfit and mle , which return parameter estimates, fitdist returns the fitted probability distribution object GammaDistribution . The object properties a and b store the parameter estimates.
Return estimates of shape (if applicable), location, and scale parameters from data. The default estimation method is Maximum Likelihood Estimation (MLE), but Method of Moments (MM) is also available.
loc is short for "location parameter", and scale is naturally any scale parameters. Location parameters would include the mean in the normal distribution and the median in the Cauchy distribution. Scale parameters are like the standard deviation in the normal distribution, or either parameter of the gamma distribution.
SciPy performs parameter estimation using MLE (documentation). When you fit a certain probability distribution to your data, you must then test the goodness of fit. Kolmogorov–Smirnov test is an option and the widely used one.
edit
Warning: The following illustrates the use of GenericLikelihoodModel
following the example in the question.
However, in the case of the gamma distribution the location parameter shifts the support of the distribution which is ruled out by the general assumptions for maximum likelihood estimation. The more standard usage would fix the support, floc=0, so it is just a two-parameter distribution. In that case, standard MLE theory applies.
Statsmodels has a generic class for maximum likelihood estimation, GenericLikelihoodModel
. It's not directly designed for this case, but can be used with some help (defining attributes and providing start_params).
import numpy as np
from statsmodels.base.model import GenericLikelihoodModel
from scipy.stats import gamma
shape = 12; loc = 0.71; scale = 0.0166
data = gamma.rvs(shape, loc=loc, scale=scale, size=1000)
params = gamma.fit(data) # params close to but not the same as (shape, loc, scale)
# HOW TO ESTIMATE/GET ERRORS FOR EACH PARAM?
print(params)
print('\n')
class Gamma(GenericLikelihoodModel):
nparams = 3
def loglike(self, params):
return gamma.logpdf(self.endog, *params).sum()
res = Gamma(data).fit(start_params=params)
res.df_model = len(params)
res.df_resid = len(data) - len(params)
print(res.summary())
This prints the following
(10.31888758604304, 0.71645502437403186, 0.018447479022445423)
Optimization terminated successfully.
Current function value: -1.439996
Iterations: 69
Function evaluations: 119
Gamma Results
==============================================================================
Dep. Variable: y Log-Likelihood: 1440.0
Model: Gamma AIC: -2872.
Method: Maximum Likelihood BIC: -2852.
Date: Sun, 12 Jul 2015
Time: 04:00:05
No. Observations: 1000
Df Residuals: 997
Df Model: 3
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
par0 10.3187 2.242 4.603 0.000 5.925 14.712
par1 0.7165 0.019 37.957 0.000 0.679 0.753
par2 0.0184 0.002 8.183 0.000 0.014 0.023
==============================================================================
Other results based on the maximum likelihood estimates are then also available, for example a z-test that the first parameter is 10 can be performed like either by specifying a restriction matrix or by using a string expression with the equality:
>>> res.t_test(([1, 0, 0], [10]))
<class 'statsmodels.stats.contrast.ContrastResults'>
Test for Constraints
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
c0 10.3187 2.242 0.142 0.887 5.925 14.712
==============================================================================
>>> res.t_test('par0=10')
<class 'statsmodels.stats.contrast.ContrastResults'>
Test for Constraints
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
c0 10.3187 2.242 0.142 0.887 5.925 14.712
==============================================================================
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With