I would like to find the R implementation that most closely resembles Stata output for fitting a Least Squares Regression function with Heteroskedastic Corrected Standard Errors. Specifically I would like the corrected standard errors to be in the "summary" and not have to do additional calculations for my initial round of hypothesis testing. I am looking for a solution that is as "clean" as what Eviews and Stata provide.
So far, using the "lmtest" package the best I can come up with is:
model <- lm(...)
coeftest(model, vcov = hccm)
This gives me the output that I want, but it does not seem to be using "coeftest" for its stated purpose. I would also have to use the summary with the incorrect standard errors to read off the R^2 and F stat, etc. I feel that there should exist a "one line" solution to this problem given how dynamic R is.
Thanks
"Heteroscedasticity" makes it difficult to estimate the true standard deviation of the forecast errors. This can lead to confidence intervals that are too wide or too narrow (in particular they will be too narrow for out-of-sample predictions, if the variance of the errors is increasing over time).
“Robust” standard errors is a technique to obtain unbiased standard errors of OLS coefficients under heteroscedasticity. Remember, the presence of heteroscedasticity violates the Gauss Markov assumptions that are necessary to render OLS the best linear unbiased estimator (BLUE).
Another way to fix heteroscedasticity is to use weighted regression. This type of regression assigns a weight to each data point based on the variance of its fitted value. What is this? Essentially, this gives small weights to data points that have higher variances, which shrinks their squared residuals.
Correcting for Heteroscedasticity One way to correct for heteroscedasticity is to compute the weighted least squares (WLS) estimator using an hypothesized specification for the variance. Often this specification is one of the regressors or its square.
I think you are on the right track with coeftest
in package lmtest. Take a look at the sandwich package which includes this functionality and is designed to work hand in hand with the lmtest package you have already found.
> # generate linear regression relationship
> # with Homoskedastic variances
> x <- sin(1:100)
> y <- 1 + x + rnorm(100)
> ## model fit and HC3 covariance
> fm <- lm(y ~ x)
> vcovHC(fm)
(Intercept) x
(Intercept) 0.010809366 0.001209603
x 0.001209603 0.018353076
> coeftest(fm, vcov. = vcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.01973 0.10397 9.8081 3.159e-16 ***
x 0.93992 0.13547 6.9381 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
To get the F test, look at function waldtest()
:
> waldtest(fm, vcov = vcovHC)
Wald test
Model 1: y ~ x
Model 2: y ~ 1
Res.Df Df F Pr(>F)
1 98
2 99 -1 48.137 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You could always cook up a simple function to combine these two for you if you wanted the one-liner...
There are lots of examples in the Econometric Computing with HC and HAC Covariance Matrix Estimators vignette that comes with the sandwich package of linking lmtest and sandwich to do what you want.
Edit: A one-liner could be as simple as:
mySummary <- function(model, VCOV) {
print(coeftest(model, vcov. = VCOV))
print(waldtest(model, vcov = VCOV))
}
Which we can use like this (on the examples from above):
> mySummary(fm, vcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.01973 0.10397 9.8081 3.159e-16 ***
x 0.93992 0.13547 6.9381 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Wald test
Model 1: y ~ x
Model 2: y ~ 1
Res.Df Df F Pr(>F)
1 98
2 99 -1 48.137 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I found an R function that does exactly what you are looking for. It gives you robust standard errors without having to do additional calculations. You run summary()
on an lm.object and if you set the parameter robust=T
it gives you back Stata-like heteroscedasticity consistent standard errors.
summary(lm.object, robust=T)
You can find the function on https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r/
There is now a one-line solution using lm_robust
from the estimatr
package, which you can install from CRAN install.packages(estimatr)
.
> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, se_type = "stata")
> summary(lmro)
Call:
lm_robust(formula = mpg ~ hp, data = mtcars, se_type = "stata")
Standard error type: HC1
Coefficients:
Estimate Std. Error Pr(>|t|) CI Lower CI Upper DF
(Intercept) 30.09886 2.07661 4.348e-15 25.85785 34.33987 30
hp -0.06823 0.01356 2.132e-05 -0.09592 -0.04053 30
Multiple R-squared: 0.6024 , Adjusted R-squared: 0.5892
F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
You can also get tidy output:
> tidy(lmro)
term estimate std.error p.value ci.lower
1 (Intercept) 30.09886054 2.0766149 4.347723e-15 25.85784704
2 hp -0.06822828 0.0135604 2.131785e-05 -0.09592231
ci.upper df outcome
1 34.33987404 30 mpg
2 -0.04053425 30 mpg
The "stata"
standard errors default to "HC1" standard errors, which are the default rob
standard errors in Stata. You can also get "classical", "HC0", "HC1", "HC2", "HC3"
and various clustered standard errors as well (including those that match Stata).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With