I am trying to replicate a logit regression from Stata to R. In Stata I use the option "robust" to have the robust standard error (heteroscedasticity-consistent standard error). I am able to replicate the exactly same coefficients from Stata, but I am not able to have the same robust standard error with the package "sandwich".
I have tried some OLS linear regression examples; it seems like the sandwich estimators of R and Stata give me the same robust standard error for OLS. Does anybody know how Stata calculate the sandwich estimator for non-linear regression, in my case the logit regression?
Thank you!
Codes Attached: in R:
library(sandwich)
library(lmtest)
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank<-factor(mydata$rank)
myfit<-glm(admit~gre+gpa+rank,data=mydata,family=binomial(link="logit"))
summary(myfit)
coeftest(myfit, vcov = sandwich)
coeftest(myfit, vcov = vcovHC(myfit, "HC0"))
coeftest(myfit, vcov = vcovHC(myfit))
coeftest(myfit, vcov = vcovHC(myfit, "HC3"))
coeftest(myfit, vcov = vcovHC(myfit, "HC1"))
coeftest(myfit, vcov = vcovHC(myfit, "HC2"))
coeftest(myfit, vcov = vcovHC(myfit, "HC"))
coeftest(myfit, vcov = vcovHC(myfit, "const"))
coeftest(myfit, vcov = vcovHC(myfit, "HC4"))
coeftest(myfit, vcov = vcovHC(myfit, "HC4m"))
coeftest(myfit, vcov = vcovHC(myfit, "HC5"))
Stata:
use http://www.ats.ucla.edu/stat/stata/dae/binary.dta, clear
logit admit gre gpa i.rank, robust
Robust standard errors are generally larger than non-robust standard errors, but are sometimes smaller. Clustered standard errorsClustered standard errorsClustered standard errors (or Liang-Zeger standard errors) are measurements that estimate the standard error of a regression parameter in settings where observations may be subdivided into smaller-sized groups ("clusters") and where the sampling and/or treatment assignment is correlated within each group.https://en.wikipedia.org › wiki › Clustered_standard_errorsClustered standard errors - Wikipedia are a special kind of robust standard errors that account for heteroskedasticity across “clusters” of observations (such as states, schools, or individuals).
A robust standard error is a different way of calculating the standard error of a regression coefficient in a regression model. It is also referred to as a sandwich standard error, an Eicker-White standard error, a Huber standard error, a heteroscedasticity consistent standard error, and probably a few other names.
Also -- note that the R^2 and adjusted R^2 values are the same regardless of whether or not you use robust standard errors. So, if you also run regression without the robust option the value is already reported for you.
How do I obtain the standard error of the predicted probability with logistic regression analysis? se(pi) = H'(linear combination) * stdp = pi*(1-pi)*stdp, by properties of the logistic function H().
The default so-called "robust" standard errors in Stata correspond to what sandwich()
from the package of the same name computes. The only difference is how the finite-sample adjustment is done. In the sandwich(...)
function no finite-sample adjustment is done at all by default, i.e., the sandwich is divided by 1/n where n is the number of observations. Alternatively, sandwich(..., adjust = TRUE)
can be used which divides by 1/(n - k) where k is the number of regressors. And Stata divides by 1/(n - 1).
Of course, asymptotically these do not differ at all. And except for a few special cases (e.g., OLS linear regression) there is no argument for 1/(n - k) or 1/(n - 1) to work "correctly" in finite samples (e.g., unbiasedness). At least not to the best of my knowledge.
So to obtain the same results as in Stata you can do do:
sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
coeftest(myfit, vcov = sandwich1)
This yields
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.9899791 1.1380890 -3.5059 0.0004551 ***
gre 0.0022644 0.0011027 2.0536 0.0400192 *
gpa 0.8040375 0.3451359 2.3296 0.0198259 *
rank2 -0.6754429 0.3144686 -2.1479 0.0317228 *
rank3 -1.3402039 0.3445257 -3.8900 0.0001002 ***
rank4 -1.5514637 0.4160544 -3.7290 0.0001922 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And just for the record: In the binary response case, these "robust" standard errors are not robust against anything. Provided that the model is correctly specified, they are consistent and it's ok to use them but they don't guard against any misspecification in the model. Because the basic assumption for the sandwich standard errors to work is that the model equation (or more precisely the corresponding score function) is correctly specified while the rest of the model may be misspecified. However, in a binary regression there is no room for misspecification because the model equation just consists of the mean (= probability) and the likelihood is the mean and 1 - mean, respectively. This is in contrast to linear or count data regression where there may be heteroskedasticity, overdispersion, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With