Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting statsmodels to use heteroskedasticity corrected standard errors in coefficient t-tests

I've been digging into the API of statsmodels.regression.linear_model.RegressionResults and have found how to retrieve different flavors of heteroskedasticity corrected standard errors (via properties like HC0_se, etc.) However, I can't quite figure out how to get the t-tests on the coefficients to use these corrected standard errors. Is there a way to do this in the API, or do I have to do it manually? If the latter, can you suggest any guidance on how to do this with statsmodels results?

like image 740
sparc_spread Avatar asked May 31 '15 04:05

sparc_spread


People also ask

How do you fix Heteroskedasticity in Python?

How to fix the problem: Log-transform the y variable to 'dampen down' some of the heteroscedasticity, then build an OLSR model for log(y). Use a Generalized Linear Model (GLM) such as the Negative Binomial regression model which does not assume that the data set is homoscedastic.

Does Heteroskedasticity inflate standard errors?

Heteroskedasticity introduces bias into estimators of the standard error of regression coefficients making the t-tests for the significance of individual regression coefficients unreliable. iv. More specifically, it results in inflated t-statistics and underestimated standard errors.

Why do we use heteroskedasticity robust standard errors?

Heteroskedasticity-consistent standard errors are used to allow the fitting of a model that does contain heteroskedastic residuals. The first such approach was proposed by Huber (1967), and further improved procedures have been produced since for cross-sectional data, time-series data and GARCH estimation.

What is HAC test?

The estimator is used to try to overcome autocorrelation (also called serial correlation), and heteroskedasticity in the error terms in the models, often for regressions applied to time series data. The abbreviation "HAC," sometimes used for the estimator, stands for "heteroskedasticity and autocorrelation consistent."


1 Answers

The fit method of the linear models, discrete models and GLM, take a cov_type and a cov_kwds argument for specifying robust covariance matrices. This will be attached to the results instance and used for all inference and statistics reported in the summary table.

Unfortunately, the documentation doesn't really show this yet in an appropriate way. The auxiliary method that actually selects the sandwiches based on the options shows the options and required arguments: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.fit.html

For example, estimating an OLS model and using HC3 covariance matrices can be done with

model_ols = OLS(...)
result = model_ols.fit(cov_type='HC3')
result.bse
result.t_test(....)

Some sandwiches require additional arguments, for example cluster robust standard errors, can be selected in the following way, assuming mygroups is an array that contains the groups labels:

results = OLS(...).fit(cov_type='cluster', cov_kwds={'groups': mygroups}
results.bse
...

Some robust covariance matrices make additional assumptions about the data without checking. For example heteroscedasticity and autocorrelation robust standard errors or Newey-West, HAC, standard errors assume a sequential time series structure. Some panel data robust standard errors also assume stacking of the time series by individuals.

A separate option use_t is available to specify whether the t and F or the normal and chisquare distributions should be used by default for Wald tests and confidence intervals.

like image 77
Josef Avatar answered Sep 23 '22 16:09

Josef