I've been digging into the API of <code>statsmodels.regression.linear_model.RegressionResults</code> and have found how to retrieve different flavors of heteroskedasticity corrected standard errors (via properties like <code>HC0_se</code>, etc.) However, I can't quite figure out how to get the t-tests on the coefficients to use these corrected standard errors. Is there a way to do this in the API, or do I have to do it manually? If the latter, can you suggest any guidance on how to do this with statsmodels results?

The <code>fit</code> method of the linear models, discrete models and GLM, take a <code>cov_type</code> and a <code>cov_kwds</code> argument for specifying robust covariance matrices. This will be attached to the results instance and used for all inference and statistics reported in the summary table. Unfortunately, the documentation doesn't really show this yet in an appropriate way. The auxiliary method that actually selects the sandwiches based on the options shows the options and required arguments: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.fit.html For example, estimating an OLS model and using <code>HC3</code> covariance matrices can be done with <pre class="prettyprint"><code>model_ols = OLS(...) result = model_ols.fit(cov_type='HC3') result.bse result.t_test(....) </code></pre> Some sandwiches require additional arguments, for example cluster robust standard errors, can be selected in the following way, assuming <code>mygroups</code> is an array that contains the groups labels: <pre class="prettyprint"><code>results = OLS(...).fit(cov_type='cluster', cov_kwds={'groups': mygroups} results.bse ... </code></pre> Some robust covariance matrices make additional assumptions about the data without checking. For example heteroscedasticity and autocorrelation robust standard errors or Newey-West, <code>HAC</code>, standard errors assume a sequential time series structure. Some panel data robust standard errors also assume stacking of the time series by individuals. A separate option <code>use_t</code> is available to specify whether the t and F or the normal and chisquare distributions should be used by default for Wald tests and confidence intervals.

Getting statsmodels to use heteroskedasticity corrected standard errors in coefficient t-tests

Tags:

python

statsmodels

regression

I've been digging into the API of statsmodels.regression.linear_model.RegressionResults and have found how to retrieve different flavors of heteroskedasticity corrected standard errors (via properties like HC0_se, etc.) However, I can't quite figure out how to get the t-tests on the coefficients to use these corrected standard errors. Is there a way to do this in the API, or do I have to do it manually? If the latter, can you suggest any guidance on how to do this with statsmodels results?

740

asked May 31 '15 04:05

sparc_spread

1 Answers

The fit method of the linear models, discrete models and GLM, take a cov_type and a cov_kwds argument for specifying robust covariance matrices. This will be attached to the results instance and used for all inference and statistics reported in the summary table.

Unfortunately, the documentation doesn't really show this yet in an appropriate way. The auxiliary method that actually selects the sandwiches based on the options shows the options and required arguments: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.fit.html

For example, estimating an OLS model and using HC3 covariance matrices can be done with

model_ols = OLS(...)
result = model_ols.fit(cov_type='HC3')
result.bse
result.t_test(....)

Some sandwiches require additional arguments, for example cluster robust standard errors, can be selected in the following way, assuming mygroups is an array that contains the groups labels:

results = OLS(...).fit(cov_type='cluster', cov_kwds={'groups': mygroups}
results.bse
...

Some robust covariance matrices make additional assumptions about the data without checking. For example heteroscedasticity and autocorrelation robust standard errors or Newey-West, HAC, standard errors assume a sequential time series structure. Some panel data robust standard errors also assume stacking of the time series by individuals.

A separate option use_t is available to specify whether the t and F or the normal and chisquare distributions should be used by default for Wald tests and confidence intervals.

answered Sep 23 '22 16:09

Josef

Related questions
                            
                                How do I get the most recent Cloudwatch metric data for an instance using Boto?
                            
                                Print chosen worksheets in excel files to pdf in python
                            
                                Python list equivalent in C++?
                            
                                Python: invalid literal for int() with base 10: '808.666666666667'
                            
                                ImportError: No module named gi.repository Mac OS X
                            
                                Why doesn't .rstrip('\n') work?
                            
                                Mask a circular sector in a numpy array
                            
                                Proximity Matrix in sklearn.ensemble.RandomForestClassifier
                            
                                how to plot arbitrary markers on a pandas data series?
                            
                                Django: Check for related objects and whether it contains data
                            
                                What does base value do in int function?
                            
                                How to sort integer list in python descending order
                            
                                Create superuser Django in PyCharm
                            
                                How to return indices of values between two numbers in numpy array
                            
                                Turning off Tick Marks in Bokeh
                            
                                Fit multivariate gaussian distribution to a given dataset
                            
                                How to close socket connection on Ctrl-C in a python programme
                            
                                TypeError: __init__() takes 1 positional argument but 3 were given
                            
                                REQUESTS: Return file object from url (as with open('','rb') )
                            
                                Fill zero values of 1d numpy array with last non-zero values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With