I would like to find the R implementation that most closely resembles Stata output for fitting a Least Squares Regression function with Heteroskedastic Corrected Standard Errors. Specifically I would like the corrected standard errors to be in the "summary" and not have to do additional calculations for my initial round of hypothesis testing. I am looking for a solution that is as "clean" as what Eviews and Stata provide. So far, using the "lmtest" package the best I can come up with is: <pre class="prettyprint"><code>model <- lm(...) coeftest(model, vcov = hccm) </code></pre> This gives me the output that I want, but it does not seem to be using "coeftest" for its stated purpose. I would also have to use the summary with the incorrect standard errors to read off the R^2 and F stat, etc. I feel that there should exist a "one line" solution to this problem given how dynamic R is. Thanks

I think you are on the right track with <code>coeftest</code> in package lmtest. Take a look at the sandwich package which includes this functionality and is designed to work hand in hand with the lmtest package you have already found. <pre class="prettyprint"><code>> # generate linear regression relationship > # with Homoskedastic variances > x <- sin(1:100) > y <- 1 + x + rnorm(100) > ## model fit and HC3 covariance > fm <- lm(y ~ x) > vcovHC(fm) (Intercept) x (Intercept) 0.010809366 0.001209603 x 0.001209603 0.018353076 > coeftest(fm, vcov. = vcovHC) t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.01973 0.10397 9.8081 3.159e-16 *** x 0.93992 0.13547 6.9381 4.313e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 </code></pre> To get the F test, look at function <code>waldtest()</code>: <pre class="prettyprint"><code>> waldtest(fm, vcov = vcovHC) Wald test Model 1: y ~ x Model 2: y ~ 1 Res.Df Df F Pr(>F) 1 98 2 99 -1 48.137 4.313e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 </code></pre> You could always cook up a simple function to combine these two for you if you wanted the one-liner... There are lots of examples in the Econometric Computing with HC and HAC Covariance Matrix Estimators vignette that comes with the sandwich package of linking lmtest and sandwich to do what you want. Edit: A one-liner could be as simple as: <pre class="prettyprint"><code>mySummary <- function(model, VCOV) { print(coeftest(model, vcov. = VCOV)) print(waldtest(model, vcov = VCOV)) } </code></pre> Which we can use like this (on the examples from above): <pre class="prettyprint"><code>> mySummary(fm, vcovHC) t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.01973 0.10397 9.8081 3.159e-16 *** x 0.93992 0.13547 6.9381 4.313e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Wald test Model 1: y ~ x Model 2: y ~ 1 Res.Df Df F Pr(>F) 1 98 2 99 -1 48.137 4.313e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 </code></pre>

I found an R function that does exactly what you are looking for. It gives you robust standard errors without having to do additional calculations. You run <code>summary()</code> on an lm.object and if you set the parameter <code>robust=T</code> it gives you back Stata-like heteroscedasticity consistent standard errors. <pre class="prettyprint"><code>summary(lm.object, robust=T) </code></pre> You can find the function on https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r/

Regression with Heteroskedasticity Corrected Standard Errors

Tags:

r

stata

I would like to find the R implementation that most closely resembles Stata output for fitting a Least Squares Regression function with Heteroskedastic Corrected Standard Errors. Specifically I would like the corrected standard errors to be in the "summary" and not have to do additional calculations for my initial round of hypothesis testing. I am looking for a solution that is as "clean" as what Eviews and Stata provide.

So far, using the "lmtest" package the best I can come up with is:

model <- lm(...)
coeftest(model, vcov = hccm)

This gives me the output that I want, but it does not seem to be using "coeftest" for its stated purpose. I would also have to use the summary with the incorrect standard errors to read off the R^2 and F stat, etc. I feel that there should exist a "one line" solution to this problem given how dynamic R is.

Thanks

610

asked Dec 08 '10 08:12

JJJ

3 Answers

I think you are on the right track with coeftest in package lmtest. Take a look at the sandwich package which includes this functionality and is designed to work hand in hand with the lmtest package you have already found.

> # generate linear regression relationship
> # with Homoskedastic variances
> x <- sin(1:100)
> y <- 1 + x + rnorm(100)
> ## model fit and HC3 covariance
> fm <- lm(y ~ x)
> vcovHC(fm)
            (Intercept)           x
(Intercept) 0.010809366 0.001209603
x           0.001209603 0.018353076
> coeftest(fm, vcov. = vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

To get the F test, look at function waldtest():

> waldtest(fm, vcov = vcovHC)
Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You could always cook up a simple function to combine these two for you if you wanted the one-liner...

There are lots of examples in the Econometric Computing with HC and HAC Covariance Matrix Estimators vignette that comes with the sandwich package of linking lmtest and sandwich to do what you want.

Edit: A one-liner could be as simple as:

mySummary <- function(model, VCOV) {
    print(coeftest(model, vcov. = VCOV))
    print(waldtest(model, vcov = VCOV))
}

Which we can use like this (on the examples from above):

> mySummary(fm, vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

138

answered Oct 05 '22 17:10

Gavin Simpson

I found an R function that does exactly what you are looking for. It gives you robust standard errors without having to do additional calculations. You run summary() on an lm.object and if you set the parameter robust=T it gives you back Stata-like heteroscedasticity consistent standard errors.

summary(lm.object, robust=T)

You can find the function on https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r/

answered Oct 05 '22 19:10

Sandra Lopez

There is now a one-line solution using lm_robust from the estimatr package, which you can install from CRAN install.packages(estimatr).

> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, se_type = "stata")
> summary(lmro)

Call:
lm_robust(formula = mpg ~ hp, data = mtcars, se_type = "stata")

Standard error type:  HC1 

Coefficients:
            Estimate Std. Error  Pr(>|t|) CI Lower CI Upper DF
(Intercept) 30.09886    2.07661 4.348e-15 25.85785 34.33987 30
hp          -0.06823    0.01356 2.132e-05 -0.09592 -0.04053 30

Multiple R-squared:  0.6024 ,   Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

You can also get tidy output:

> tidy(lmro)
         term    estimate std.error      p.value    ci.lower
1 (Intercept) 30.09886054 2.0766149 4.347723e-15 25.85784704
2          hp -0.06822828 0.0135604 2.131785e-05 -0.09592231
     ci.upper df outcome
1 34.33987404 30     mpg
2 -0.04053425 30     mpg

The "stata" standard errors default to "HC1" standard errors, which are the default rob standard errors in Stata. You can also get "classical", "HC0", "HC1", "HC2", "HC3" and various clustered standard errors as well (including those that match Stata).

answered Oct 05 '22 17:10

luke.sonnet

Related questions
                            
                                R says "Cannot take a sample larger than the population" -- but I am not taking a sample larger than the population
                            
                                How to print R variables in middle of String
                            
                                ggplot: text printed by geom_text is not clear
                            
                                Is there a way to `source()` and continue after an error?
                            
                                How to identify/delete non-UTF-8 characters in R
                            
                                Reading two-line headers in R
                            
                                Combine and run multiple R scripts from another script
                            
                                Successor to ggplot2::fortify
                            
                                Writing functions in R, keeping scoping in mind
                            
                                C for R programmers - recommended resources/approaches once past the basics [closed]
                            
                                Copying and modifying a default theme
                            
                                How to put a geom_sf produced map on top of a ggmap produced raster
                            
                                Are double "" and single '' quotes (always) interchangeable in R?
                            
                                Can R be used for GIS? [closed]
                            
                                Alter just horizontal spacing between facets (ggplot2)
                            
                                Using cbind on an arbitrarily long list of objects
                            
                                Get columns by string from data.table [duplicate]
                            
                                R - save multiplot to file
                            
                                How to get nth percentile using R summary?
                            
                                "Error: Continuous value supplied to discrete scale" in default data set example mtcars and ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With