A newbie question: does anyone know how to run a logistic regression with clustered standard errors in R? In Stata it's just logit Y X1 X2 X3, vce(cluster Z)
, but unfortunately I haven't figured out how to do the same analysis in R. Thanks in advance!
Clustered standard errors are a common way to deal with this problem. Unlike Stata, R doesn’t have built-in functionality to estimate clustered standard errors. There are several packages though that add this functionality and this article will introduce three of them, explaining how they can be used and what their advantages and disadvantages are.
To get the standard errors, one performs the same steps as before, after adjusting the degrees of freedom for clusters. For calculating robust standard errors in R, both with more goodies and in (probably) a more efficient way, look at the sandwich package.
There is also a relatively new and convenient package computing cluster-robust standard errors for linear models and generalized linear models. See here. Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question. Provide details and share your research! But avoid …
It seems to me that, in the case of continuous outcomes, robust estimators of standard errors are rather simple, given that variance of residuals for each observation is calculated as the squared (estimated) residuals from the regression.
Another alternative would be to use the sandwich
and lmtest
package as follows. Suppose that z
is a column with the cluster indicators in your dataset dat
. Then
# load libraries
library("sandwich")
library("lmtest")
# fit the logistic regression
fit = glm(y ~ x, data = dat, family = binomial)
# get results with clustered standard errors (of type HC0)
coeftest(fit, vcov. = vcovCL(fit, cluster = dat$z, type = "HC0"))
will do the job.
There is a command glm.cluster
in the R package miceadds
which seems to give the same results for logistic regression as Stata does with the option vce(cluster)
. See the documentation here.
In one of the examples on this page, the commands
mod2 <- miceadds::glm.cluster(data=dat, formula=highmath ~ hisei + female,
cluster="idschool", family="binomial")
summary(mod2)
give the same robust standard errors as the Stata command
logit highmath hisei female, vce(cluster idschool)
e.g. a standard error of 0.004038 for the variable hisei
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With