Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Newey-West standard errors for OLS in Python?

I want to have a coefficient and Newey-West standard error associated with it.

I am looking for Python library (ideally, but any working solutions is fine) that can do what the following R code is doing:

library(sandwich)
library(lmtest)

a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9))
b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9))

temp.lm = lm(a ~ b)

temp.summ <- summary(temp.lm)
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest))

print (temp.summ$coefficients)

Result:

             Estimate Std. Error   t value  Pr(>|t|)
(Intercept) 2.0576208  2.5230532 0.8155281 0.4358205
b           0.5594796  0.4071834 1.3740235 0.2026817

I get the coefficients and associated with them standard errors.

I see statsmodels.stats.sandwich_covariance.cov_hac module, but I don't see how to make it work with OLS.

like image 204
Akavall Avatar asked May 02 '14 03:05

Akavall


People also ask

What do Newey-West standard errors do?

The Newey-West method handles autocorrelation with lags up to h, and so it is assumed that lags larger than h can be ignored. Note too that Newey-West not only corrects for autocorrelation it also corrects for heteroscedasticity (heterogeneity of variances).

What is Newey-West correction?

A Newey–West estimator is used in statistics and econometrics to provide an estimate of the covariance matrix of the parameters of a regression-type model where the standard assumptions of regression analysis do not apply. It was devised by Whitney K. Newey and Kenneth D.

What are standard errors in OLS?

The standard error of a coefficient indicates the accuracy of the estimated ordinary least squares (OLS) coefficient with respect to its population parameter. Each standard error is the square root of the variance of the corresponding coefficient.

Does Newey-West change coefficients?

Newey-West estimators adjust how the standard errors of the regression coefficients are calculated, but not the standard error of the model (eg. square root of the mean square error).


1 Answers

Edited (10/31/2015) to reflect preferred coding style for statsmodels as fall 2015.

In statsmodels version 0.6.1 you can do the following:

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9],
                   'b':[3,5,6,2,4,6,7,8,7,8,9]})

reg = smf.ols('a ~ 1 + b',data=df).fit(cov_type='HAC',cov_kwds={'maxlags':1})
print reg.summary()

                                OLS Regression Results
==============================================================================
Dep. Variable:                      a   R-squared:                       0.281
Model:                            OLS   Adj. R-squared:                  0.201
Method:                 Least Squares   F-statistic:                     1.949
Date:                Sat, 31 Oct 2015   Prob (F-statistic):              0.196
Time:                        03:15:46   Log-Likelihood:                -22.603
No. Observations:                  11   AIC:                             49.21
Df Residuals:                       9   BIC:                             50.00
Df Model:                           1
Covariance Type:                  HAC
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.0576      2.661      0.773      0.439        -3.157     7.272
b              0.5595      0.401      1.396      0.163        -0.226     1.345
==============================================================================
Omnibus:                        0.361   Durbin-Watson:                   1.468
Prob(Omnibus):                  0.835   Jarque-Bera (JB):                0.331
Skew:                           0.321   Prob(JB):                        0.847
Kurtosis:                       2.442   Cond. No.                         19.1
==============================================================================

Warnings:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction

Or you can use the get_robustcov_results method after fitting the model:

reg = smf.ols('a ~ 1 + b',data=df).fit()
new = reg.get_robustcov_results(cov_type='HAC',maxlags=1)
print new.summary()


                                OLS Regression Results
==============================================================================
Dep. Variable:                      a   R-squared:                       0.281
Model:                            OLS   Adj. R-squared:                  0.201
Method:                 Least Squares   F-statistic:                     1.949
Date:                Sat, 31 Oct 2015   Prob (F-statistic):              0.196
Time:                        03:15:46   Log-Likelihood:                -22.603
No. Observations:                  11   AIC:                             49.21
Df Residuals:                       9   BIC:                             50.00
Df Model:                           1
Covariance Type:                  HAC
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.0576      2.661      0.773      0.439        -3.157     7.272
b              0.5595      0.401      1.396      0.163        -0.226     1.345
==============================================================================
Omnibus:                        0.361   Durbin-Watson:                   1.468
Prob(Omnibus):                  0.835   Jarque-Bera (JB):                0.331
Skew:                           0.321   Prob(JB):                        0.847
Kurtosis:                       2.442   Cond. No.                         19.1
==============================================================================

Warnings:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction

The defaults for statsmodels are slightly different than the defaults for the equivalent method in R. The R method can be made equivalent to the statsmodels default (what I did above) by changing the vcov, call to the following:

temp.summ$coefficients <- unclass(coeftest(temp.lm, 
               vcov. = NeweyWest(temp.lm,lag=1,prewhite=FALSE)))
print (temp.summ$coefficients)

             Estimate Std. Error   t value  Pr(>|t|)
(Intercept) 2.0576208  2.6605060 0.7733945 0.4591196
b           0.5594796  0.4007965 1.3959193 0.1962142

You can also still do Newey-West in pandas (0.17), although I believe the plan is to deprecate OLS in pandas:

print pd.stats.ols.OLS(df.a,df.b,nw_lags=1)

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         11
Number of Degrees of Freedom:   2

R-squared:         0.2807
Adj R-squared:     0.2007

Rmse:              2.0880

F-stat (1, 9):     1.5943, p-value:     0.2384

Degrees of Freedom: model 1, resid 9

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
 --------------------------------------------------------------------------------
             x     0.5595     0.4431       1.26     0.2384    -0.3090     1.4280
     intercept     2.0576     2.9413       0.70     0.5019    -3.7073     7.8226
*** The calculations are Newey-West adjusted with lags     1

---------------------------------End of Summary---------------------------------
like image 174
Karl D. Avatar answered Oct 07 '22 15:10

Karl D.