I am hoping to run a logit regression which predicts the marginal effect at the mean of family size and age, and the effect of binary indicators (whether an individual is an immigrant, has health insurance, or smokes) on the predicted probability of developing hypertension.
This data comes from a clustered survey, and I am hoping to include robust clustered standard errors in the output.
But when I add the code to include robust cluster SE, I receive an error that the variables in my regression are not longer found and I'm not sure why. Any advice would be great! Thanks.
AGE IMMIGRANT FAMSIZE HLTH_INS HYPERTEN SMOKE PSU
<int> <dbl> <int> <dbl> <dbl> <dbl> <int>
40 0 2 1 0 0 2
23 0 2 1 0 0 1
24 0 2 1 0 0 2
18 0 3 1 1 0 2
30 0 2 1 0 0 2
33 1 6 0 0 0 1
#or if this is an easier output to reproduce:
structure(list(AGE = c(40L, 23L, 24L, 18L, 30L, 33L, 32L, 63L,
22L, 24L), IMMIGRANT = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 1), FAMSIZE = c(2L,
2L, 2L, 3L, 2L, 6L, 2L, 1L, 2L, 1L), HLTH_INS = c(1, 1, 1, 1,
1, 0, 1, 1, 1, 0), HYPERTEN = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0),
SMOKE = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1), PSU = c(2L, 1L,
2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L)), row.names = c(NA, -10L), class = "data.frame")
#The regression works without adjusting for clustered SE
logit<-logitmfx(HYPERTEN~scale(AGE)+IMMIGRANT+scale(FAMSIZE)+HLTH_INS+
SMOKE,data=sample,
atmean=TRUE,robust=T)
#However, when I add in the code to cluster SE I receive the error: "Error in scale(AGE) : object 'AGE' not found"
logit<-logitmfx(HYPERTEN~scale(AGE)+IMMIGRANT+scale(FAMSIZE)+HLTH_INS+
SMOKE,data=sample,
atmean=TRUE,robust=T,clustervar1="PSU", clustervar2=NULL,!is.null("PSU"))
Trying to replicate the steps of the function using the source code, Steffen Moritz' solution should indeed work. The problem arises, since first logitmfx
instantly calls another function logitmfxest
.
This function has the same arguments, but it also has the following code:
if(!is.null(clustervar1)){
if(is.null(clustervar2)){
if(!(clustervar1 %in% names(data))){
stop("clustervar1 not in data.frame object")
}
data = data.frame(model.frame(formula, data, na.action=NULL),data[,clustervar1])
names(data)[dim(data)[2]] = clustervar1
data=na.omit(data)
}
if(!is.null(clustervar2)){
if(!(clustervar1 %in% names(data))){
stop("clustervar1 not in data.frame object")
}
if(!(clustervar2 %in% names(data))){
stop("clustervar2 not in data.frame object")
}
data = data.frame(model.frame(formula, data, na.action=NULL),
data[,c(clustervar1,clustervar2)])
names(data)[c(dim(data)[2]-1):dim(data)[2]] = c(clustervar1,clustervar2)
data=na.omit(data)
}
}
From this the following code gets activated in your case:
if(!is.null(clustervar1)){
if(is.null(clustervar2)){
data = data.frame(model.frame(formula, data, na.action=NULL),data[,clustervar1])
names(data)[dim(data)[2]] = clustervar1
data=na.omit(data)
}
}
This redefines "data" to be a data.frame build on the model.frame. But the model frame uses names from you formula, so suddenly column 2 is called scale.AGE. and column 3 is called scale.FAMSIZE..
This is a big problem since the function then calls a generalized linear model:
fit = glm(formula, data=data, family = binomial(link = "logit"), x=T,
start = start, control = control)
where it uses your original formula containing scale(AGE) and scale(FAMSIZE), but with the new dataframe with the renamed columns.
So scaling before inputting should work. And indeed any other function will, as Steffen mentioned, cause the same error, since they will produce a similar renaming of columns when model.frame
is called.
Strange, somehow can't recognize the functions in the formula
any more.
You can try this, if you remove scale
it works fine. Also no other function like log()
seems to work.
You could try calculating scale(AGE) before, then you don't need to put it in the formula.
Could look like this:
sample$AGE<-scale(sample$AGE)
sample$FAMSIZE<-scale(sample$FAMSIZE)
form <- as.formula(HYPERTEN~AGE+IMMIGRANT+FAMSIZE+HLTH_INS+SMOKE)
#However, when I add in the code to cluster SE I receive the error: "Error in scale(AGE) : object 'AGE' not found"
logit<-logitmfx(form,data=sample,
atmean=TRUE,robust=T,clustervar1="PSU", clustervar2=NULL)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With