Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

glm and relative risk -replicate Stata code in R

Tags:

r

glm

stata

Could anyone help me replicate these relative risk calculations (and its confidence interval) in R?

A similar procedure used in Stata is described here. Could anyone show me how to do this in R (my data has clusters and strata but I've taken a simpler example)? I've tried the relrisk.est function but I'd rather use the survey package as this handles very complex designs. I'd also like to compare Stata and R estimates.I'm using Poisson as suggested here.

###STATA CODE
use http://www.ats.ucla.edu/stat/stata/faq/eyestudy 
tabulate carrot lenses
*same as R binomial svyglm below
xi: glm lenses carrot, fam(bin) 
*switch reference code
char carrot[omit] 1
xi: glm lenses i.carrot, fam(poisson) link(log) nolog robust eform

###R
library(foreign)
library(survey)
D<-read.dta("http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
table(D$lenses,D$carrot)
D$wgt<-rep(1,nrow(D))
Dd<-svydesign(id=~1,data=D,weights=~wgt)
#change category and eform....?
svyglm(lenses~carrot,Dd,family=binomial)
svyglm(lenses~carrot,Dd,family=quasipoisson(log))
like image 519
Marco M Avatar asked Jan 13 '13 08:01

Marco M


1 Answers

Your example is a simple dataset, so you don't need to use the survey package at all. I would also suggest that, when learning multiple regression with R, you start with simpler examples and gradually build up your understanding of the methods involved.

After all, my opinion is that the philosophies of Stata and R differ. Stata easily throws a ton of info in your face, without you knowing what it means or how it is derived. On the other hand R can be just as (or even more) powerful and more versatile, but lacks Stata's "automation" and forces you to go slow to get the results you want.

Here's a more literal translation of your Stata code:

require(foreign)
data <- read.dta("http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
with(data, table(carrot, lenses))
glm.out.1 <- glm(lenses ~ carrot, family="binomial", data=data)
summary(glm.out.1)
logLik(glm.out.1)   # get the log-likelihood for the model, as well
glm.out.2 <- glm(lenses ~ factor(carrot, levels=c(1,0)), family="poisson"(link="log"), data=data)
summary(glm.out.2)
logLik(glm.out.2)
exp(cbind(coefficients(glm.out.2), confint(glm.out.2)))

# deriving robust estimates. vcovHC() is in the sandwich package.
require(sandwich)
robSE <- sqrt(diag(vcovHC(glm.out.2, type="HC")))[2]
rob <- coef(glm.out.2)[2]
rob <- exp(c(rob, rob+qnorm(0.025)*robSE, rob-qnorm(0.025)*robSE))
names(rob) <- c("", "2.5 %", "97.5 %")
rob

Note that the (link="log") in the second glm() call is not necessary, because it is the default link function when family="poisson".

For deriving the "robust" estimates, I had to read this, which was very helpful. You have to use the vcovHC() function in the sandwich package to obtain a different variance-covariance matrix than the one calculated by glm(), and use that to compute the standard error and parameter estimates.

The "robust" estimates were almost identical to what I got from Stata, down to the third decimal. All other results were completely identical; run the code and see for yourself.

Oh, and one more thing: when you find your way using glm() with non-stratified designs, then map your way across the survey package, which contains counterparts of this and other analysis functions modified for complex designs. I also recommend you read Thomas Lumley's book "Complex Surveys: A guide to analysis using R".

like image 177
Theodore Lytras Avatar answered Sep 18 '22 18:09

Theodore Lytras