I want to have a coefficient and Newey-West standard error associated with it. I am looking for Python library (ideally, but any working solutions is fine) that can do what the following R code is doing: <pre class="prettyprint"><code>library(sandwich) library(lmtest) a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9)) b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9)) temp.lm = lm(a ~ b) temp.summ <- summary(temp.lm) temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest)) print (temp.summ$coefficients) </code></pre> Result: <pre class="prettyprint"><code> Estimate Std. Error t value Pr(>|t|) (Intercept) 2.0576208 2.5230532 0.8155281 0.4358205 b 0.5594796 0.4071834 1.3740235 0.2026817 </code></pre> I get the coefficients and associated with them standard errors. I see statsmodels.stats.sandwich_covariance.cov_hac module, but I don't see how to make it work with OLS.

Edited (10/31/2015) to reflect preferred coding style for <code>statsmodels</code> as fall 2015. In <code>statsmodels</code> version 0.6.1 you can do the following: <pre class="prettyprint"><code>import pandas as pd import numpy as np import statsmodels.formula.api as smf df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9], 'b':[3,5,6,2,4,6,7,8,7,8,9]}) reg = smf.ols('a ~ 1 + b',data=df).fit(cov_type='HAC',cov_kwds={'maxlags':1}) print reg.summary() OLS Regression Results ============================================================================== Dep. Variable: a R-squared: 0.281 Model: OLS Adj. R-squared: 0.201 Method: Least Squares F-statistic: 1.949 Date: Sat, 31 Oct 2015 Prob (F-statistic): 0.196 Time: 03:15:46 Log-Likelihood: -22.603 No. Observations: 11 AIC: 49.21 Df Residuals: 9 BIC: 50.00 Df Model: 1 Covariance Type: HAC ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept 2.0576 2.661 0.773 0.439 -3.157 7.272 b 0.5595 0.401 1.396 0.163 -0.226 1.345 ============================================================================== Omnibus: 0.361 Durbin-Watson: 1.468 Prob(Omnibus): 0.835 Jarque-Bera (JB): 0.331 Skew: 0.321 Prob(JB): 0.847 Kurtosis: 2.442 Cond. No. 19.1 ============================================================================== Warnings: [1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction </code></pre> Or you can use the <code>get_robustcov_results</code> method after fitting the model: <pre class="prettyprint"><code>reg = smf.ols('a ~ 1 + b',data=df).fit() new = reg.get_robustcov_results(cov_type='HAC',maxlags=1) print new.summary() OLS Regression Results ============================================================================== Dep. Variable: a R-squared: 0.281 Model: OLS Adj. R-squared: 0.201 Method: Least Squares F-statistic: 1.949 Date: Sat, 31 Oct 2015 Prob (F-statistic): 0.196 Time: 03:15:46 Log-Likelihood: -22.603 No. Observations: 11 AIC: 49.21 Df Residuals: 9 BIC: 50.00 Df Model: 1 Covariance Type: HAC ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept 2.0576 2.661 0.773 0.439 -3.157 7.272 b 0.5595 0.401 1.396 0.163 -0.226 1.345 ============================================================================== Omnibus: 0.361 Durbin-Watson: 1.468 Prob(Omnibus): 0.835 Jarque-Bera (JB): 0.331 Skew: 0.321 Prob(JB): 0.847 Kurtosis: 2.442 Cond. No. 19.1 ============================================================================== Warnings: [1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction </code></pre> The defaults for <code>statsmodels</code> are slightly different than the defaults for the equivalent method in <code>R</code>. The <code>R</code> method can be made equivalent to the <code>statsmodels</code> default (what I did above) by changing the <code>vcov,</code> call to the following: <pre class="prettyprint lang-R prettyprint-override"><code>temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest(temp.lm,lag=1,prewhite=FALSE))) print (temp.summ$coefficients) Estimate Std. Error t value Pr(>|t|) (Intercept) 2.0576208 2.6605060 0.7733945 0.4591196 b 0.5594796 0.4007965 1.3959193 0.1962142 </code></pre> You can also still do Newey-West in pandas (0.17), although I believe the plan is to deprecate OLS in pandas: <pre class="prettyprint"><code>print pd.stats.ols.OLS(df.a,df.b,nw_lags=1) -------------------------Summary of Regression Analysis------------------------- Formula: Y ~ <x> + <intercept> Number of Observations: 11 Number of Degrees of Freedom: 2 R-squared: 0.2807 Adj R-squared: 0.2007 Rmse: 2.0880 F-stat (1, 9): 1.5943, p-value: 0.2384 Degrees of Freedom: model 1, resid 9 -----------------------Summary of Estimated Coefficients------------------------ Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5% -------------------------------------------------------------------------------- x 0.5595 0.4431 1.26 0.2384 -0.3090 1.4280 intercept 2.0576 2.9413 0.70 0.5019 -3.7073 7.8226 *** The calculations are Newey-West adjusted with lags 1 ---------------------------------End of Summary--------------------------------- </code></pre>

Newey-West standard errors for OLS in Python?

Tags:

python

statistics

time-series

statsmodels

I want to have a coefficient and Newey-West standard error associated with it.

I am looking for Python library (ideally, but any working solutions is fine) that can do what the following R code is doing:

library(sandwich)
library(lmtest)

a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9))
b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9))

temp.lm = lm(a ~ b)

temp.summ <- summary(temp.lm)
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest))

print (temp.summ$coefficients)

Result:

             Estimate Std. Error   t value  Pr(>|t|)
(Intercept) 2.0576208  2.5230532 0.8155281 0.4358205
b           0.5594796  0.4071834 1.3740235 0.2026817

I get the coefficients and associated with them standard errors.

I see statsmodels.stats.sandwich_covariance.cov_hac module, but I don't see how to make it work with OLS.

204

asked May 02 '14 03:05

Akavall

1 Answers

Edited (10/31/2015) to reflect preferred coding style for statsmodels as fall 2015.

In statsmodels version 0.6.1 you can do the following:

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9],
                   'b':[3,5,6,2,4,6,7,8,7,8,9]})

reg = smf.ols('a ~ 1 + b',data=df).fit(cov_type='HAC',cov_kwds={'maxlags':1})
print reg.summary()

                                OLS Regression Results
==============================================================================
Dep. Variable:                      a   R-squared:                       0.281
Model:                            OLS   Adj. R-squared:                  0.201
Method:                 Least Squares   F-statistic:                     1.949
Date:                Sat, 31 Oct 2015   Prob (F-statistic):              0.196
Time:                        03:15:46   Log-Likelihood:                -22.603
No. Observations:                  11   AIC:                             49.21
Df Residuals:                       9   BIC:                             50.00
Df Model:                           1
Covariance Type:                  HAC
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.0576      2.661      0.773      0.439        -3.157     7.272
b              0.5595      0.401      1.396      0.163        -0.226     1.345
==============================================================================
Omnibus:                        0.361   Durbin-Watson:                   1.468
Prob(Omnibus):                  0.835   Jarque-Bera (JB):                0.331
Skew:                           0.321   Prob(JB):                        0.847
Kurtosis:                       2.442   Cond. No.                         19.1
==============================================================================

Warnings:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction

Or you can use the get_robustcov_results method after fitting the model:

reg = smf.ols('a ~ 1 + b',data=df).fit()
new = reg.get_robustcov_results(cov_type='HAC',maxlags=1)
print new.summary()


                                OLS Regression Results
==============================================================================
Dep. Variable:                      a   R-squared:                       0.281
Model:                            OLS   Adj. R-squared:                  0.201
Method:                 Least Squares   F-statistic:                     1.949
Date:                Sat, 31 Oct 2015   Prob (F-statistic):              0.196
Time:                        03:15:46   Log-Likelihood:                -22.603
No. Observations:                  11   AIC:                             49.21
Df Residuals:                       9   BIC:                             50.00
Df Model:                           1
Covariance Type:                  HAC
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.0576      2.661      0.773      0.439        -3.157     7.272
b              0.5595      0.401      1.396      0.163        -0.226     1.345
==============================================================================
Omnibus:                        0.361   Durbin-Watson:                   1.468
Prob(Omnibus):                  0.835   Jarque-Bera (JB):                0.331
Skew:                           0.321   Prob(JB):                        0.847
Kurtosis:                       2.442   Cond. No.                         19.1
==============================================================================

Warnings:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction

The defaults for statsmodels are slightly different than the defaults for the equivalent method in R. The R method can be made equivalent to the statsmodels default (what I did above) by changing the vcov, call to the following:

temp.summ$coefficients <- unclass(coeftest(temp.lm, 
               vcov. = NeweyWest(temp.lm,lag=1,prewhite=FALSE)))
print (temp.summ$coefficients)

             Estimate Std. Error   t value  Pr(>|t|)
(Intercept) 2.0576208  2.6605060 0.7733945 0.4591196
b           0.5594796  0.4007965 1.3959193 0.1962142

You can also still do Newey-West in pandas (0.17), although I believe the plan is to deprecate OLS in pandas:

print pd.stats.ols.OLS(df.a,df.b,nw_lags=1)

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         11
Number of Degrees of Freedom:   2

R-squared:         0.2807
Adj R-squared:     0.2007

Rmse:              2.0880

F-stat (1, 9):     1.5943, p-value:     0.2384

Degrees of Freedom: model 1, resid 9

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
 --------------------------------------------------------------------------------
             x     0.5595     0.4431       1.26     0.2384    -0.3090     1.4280
     intercept     2.0576     2.9413       0.70     0.5019    -3.7073     7.8226
*** The calculations are Newey-West adjusted with lags     1

---------------------------------End of Summary---------------------------------

174

answered Oct 07 '22 15:10

Karl D.

Related questions
                            
                                Are *parameters calls lazy? [duplicate]
                            
                                Plot image color histogram using matplotlib
                            
                                REST post using Python-Request
                            
                                How do I output a list of dictionaries to an Excel sheet?
                            
                                Python isnumeric function works only on unicode
                            
                                What could be the reason for a socket error "[Errno 9] Bad file descriptor"
                            
                                "UnboundLocalError: local variable referenced before assignment" when incrementing variable in function [duplicate]
                            
                                How to check if two keys in dictionary hold the same value
                            
                                Django: DateField "This field cannot be blank."
                            
                                Make two directories static in django
                            
                                Why is numpy.random.choice so slow?
                            
                                CORS - Using AJAX to post on a Python (webapp2) web service
                            
                                How to install latest version of Django 1.5 using pip?
                            
                                Floating Point Numbers [duplicate]
                            
                                How can Tornado serve a single static file at an arbitrary location?
                            
                                ReferenceError: "something" is not defined in QML
                            
                                EVE - define custom flask controllers [closed]
                            
                                How To Delete S3 Files Starting With
                            
                                Does there exist empty class in python?
                            
                                Custom sorting with Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With