Different Robust Standard Errors of Logit Regression in Stata and R

Tags:

I am trying to replicate a logit regression from Stata to R. In Stata I use the option "robust" to have the robust standard error (heteroscedasticity-consistent standard error). I am able to replicate the exactly same coefficients from Stata, but I am not able to have the same robust standard error with the package "sandwich".

I have tried some OLS linear regression examples; it seems like the sandwich estimators of R and Stata give me the same robust standard error for OLS. Does anybody know how Stata calculate the sandwich estimator for non-linear regression, in my case the logit regression?

Thank you!

Codes Attached: in R:

library(sandwich)
library(lmtest)    
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")    
mydata$rank<-factor(mydata$rank)    
myfit<-glm(admit~gre+gpa+rank,data=mydata,family=binomial(link="logit"))    
summary(myfit)    
coeftest(myfit, vcov = sandwich)    
coeftest(myfit, vcov = vcovHC(myfit, "HC0"))    
coeftest(myfit, vcov = vcovHC(myfit))    
coeftest(myfit, vcov = vcovHC(myfit, "HC3"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC1"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC2"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC"))    
coeftest(myfit, vcov = vcovHC(myfit, "const"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC4"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC4m"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC5"))

Stata:

use http://www.ats.ucla.edu/stat/stata/dae/binary.dta, clear    
logit admit gre gpa i.rank, robust

687

asked Dec 08 '14 22:12

chl111

People also ask

What is the difference between robust standard errors?

Robust standard errors are generally larger than non-robust standard errors, but are sometimes smaller. Clustered standard errorsClustered standard errorsClustered standard errors (or Liang-Zeger standard errors) are measurements that estimate the standard error of a regression parameter in settings where observations may be subdivided into smaller-sized groups ("clusters") and where the sampling and/or treatment assignment is correlated within each group.https://en.wikipedia.org › wiki › Clustered_standard_errorsClustered standard errors - Wikipedia are a special kind of robust standard errors that account for heteroskedasticity across “clusters” of observations (such as states, schools, or individuals).

What are robust standard errors in regression?

A robust standard error is a different way of calculating the standard error of a regression coefficient in a regression model. It is also referred to as a sandwich standard error, an Eicker-White standard error, a Huber standard error, a heteroscedasticity consistent standard error, and probably a few other names.

Do robust standard errors change r squared?

Also -- note that the R^2 and adjusted R^2 values are the same regardless of whether or not you use robust standard errors. So, if you also run regression without the robust option the value is already reported for you.

How do you find standard error in logistic regression?

How do I obtain the standard error of the predicted probability with logistic regression analysis? se(pi) = H'(linear combination) * stdp = pi*(1-pi)*stdp, by properties of the logistic function H().

1 Answers

The default so-called "robust" standard errors in Stata correspond to what sandwich() from the package of the same name computes. The only difference is how the finite-sample adjustment is done. In the sandwich(...) function no finite-sample adjustment is done at all by default, i.e., the sandwich is divided by 1/n where n is the number of observations. Alternatively, sandwich(..., adjust = TRUE) can be used which divides by 1/(n - k) where k is the number of regressors. And Stata divides by 1/(n - 1).

Of course, asymptotically these do not differ at all. And except for a few special cases (e.g., OLS linear regression) there is no argument for 1/(n - k) or 1/(n - 1) to work "correctly" in finite samples (e.g., unbiasedness). At least not to the best of my knowledge.

So to obtain the same results as in Stata you can do do:

sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
coeftest(myfit, vcov = sandwich1)

This yields

z test of coefficients:

              Estimate Std. Error z value  Pr(>|z|)    
(Intercept) -3.9899791  1.1380890 -3.5059 0.0004551 ***
gre          0.0022644  0.0011027  2.0536 0.0400192 *  
gpa          0.8040375  0.3451359  2.3296 0.0198259 *  
rank2       -0.6754429  0.3144686 -2.1479 0.0317228 *  
rank3       -1.3402039  0.3445257 -3.8900 0.0001002 ***
rank4       -1.5514637  0.4160544 -3.7290 0.0001922 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And just for the record: In the binary response case, these "robust" standard errors are not robust against anything. Provided that the model is correctly specified, they are consistent and it's ok to use them but they don't guard against any misspecification in the model. Because the basic assumption for the sandwich standard errors to work is that the model equation (or more precisely the corresponding score function) is correctly specified while the rest of the model may be misspecified. However, in a binary regression there is no room for misspecification because the model equation just consists of the mean (= probability) and the likelihood is the mean and 1 - mean, respectively. This is in contrast to linear or count data regression where there may be heteroskedasticity, overdispersion, etc.

156

answered Sep 29 '22 14:09

Achim Zeileis

Related questions
                            
                                How to handle DB passwords in R connection strings?
                            
                                How to convert .Rdata format into text file format
                            
                                How to display the median value in a boxplot in ggplot?
                            
                                For loop over unique values
                            
                                R: add title to wordcloud graphics / png
                            
                                Cycling through point shapes when more than 6 factor levels
                            
                                ggplot2, Ordering y axis
                            
                                Order a "mixed" vector (numbers with letters)
                            
                                Match a vector to a list of vectors
                            
                                How to increase stack space overflow for pandoc in R?
                            
                                How to check to see if a function in R is a constant function?
                            
                                Use loop to generate section of text in rmarkdown
                            
                                Add download buttons in DT::renderDataTable
                            
                                Using sqldf and RPostgreSQL together
                            
                                R Cairo installation without apt-get, sudo
                            
                                plot small region of a large polygon map in ggplot2
                            
                                R coxph() warning: Loglik converged before variable
                            
                                R: Generate sequence of powers of ten
                            
                                R Language: How to Set ylim?
                            
                                How can I install topicmodels package in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Different Robust Standard Errors of Logit Regression in Stata and R

Tags:

r

logistic-regression

regression

standard-error

stata

chl111

People also ask

1 Answers

Achim Zeileis

Recent Activity

Donate For Us