Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error with logitmfx in R to calculate robust cluster standard error

I am hoping to run a logit regression which predicts the marginal effect at the mean of family size and age, and the effect of binary indicators (whether an individual is an immigrant, has health insurance, or smokes) on the predicted probability of developing hypertension.

This data comes from a clustered survey, and I am hoping to include robust clustered standard errors in the output.

But when I add the code to include robust cluster SE, I receive an error that the variables in my regression are not longer found and I'm not sure why. Any advice would be great! Thanks.

AGE       IMMIGRANT     FAMSIZE     HLTH_INS    HYPERTEN   SMOKE    PSU
<int>       <dbl>         <int>       <dbl>       <dbl>     <dbl>  <int>
40           0              2          1            0         0      2
23           0              2          1            0         0      1
24           0              2          1            0         0      2
18           0              3          1            1         0      2
30           0              2          1            0         0      2
33           1              6          0            0         0      1

#or if this is an easier output to reproduce:
structure(list(AGE = c(40L, 23L, 24L, 18L, 30L, 33L, 32L, 63L, 
22L, 24L), IMMIGRANT = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 1), FAMSIZE = c(2L, 
2L, 2L, 3L, 2L, 6L, 2L, 1L, 2L, 1L), HLTH_INS = c(1, 1, 1, 1, 
1, 0, 1, 1, 1, 0), HYPERTEN = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0), 
    SMOKE = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1), PSU = c(2L, 1L, 
    2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L)), row.names = c(NA, -10L), class = "data.frame")


#The regression works without adjusting for clustered SE 
logit<-logitmfx(HYPERTEN~scale(AGE)+IMMIGRANT+scale(FAMSIZE)+HLTH_INS+
                 SMOKE,data=sample,
                atmean=TRUE,robust=T)


#However, when I add in the code to cluster SE I receive the error: "Error in scale(AGE) : object 'AGE' not found" 
logit<-logitmfx(HYPERTEN~scale(AGE)+IMMIGRANT+scale(FAMSIZE)+HLTH_INS+
                 SMOKE,data=sample,
                atmean=TRUE,robust=T,clustervar1="PSU", clustervar2=NULL,!is.null("PSU")) 
like image 873
juliah0494 Avatar asked Dec 30 '22 21:12

juliah0494


2 Answers

Trying to replicate the steps of the function using the source code, Steffen Moritz' solution should indeed work. The problem arises, since first logitmfx instantly calls another function logitmfxest.

This function has the same arguments, but it also has the following code:

if(!is.null(clustervar1)){
    if(is.null(clustervar2)){
      if(!(clustervar1 %in% names(data))){
        stop("clustervar1 not in data.frame object")
      }    
      data = data.frame(model.frame(formula, data, na.action=NULL),data[,clustervar1])
      names(data)[dim(data)[2]] = clustervar1
      data=na.omit(data)
    }
    if(!is.null(clustervar2)){
      if(!(clustervar1 %in% names(data))){
        stop("clustervar1 not in data.frame object")
      }    
      if(!(clustervar2 %in% names(data))){
        stop("clustervar2 not in data.frame object")
      }    
      data = data.frame(model.frame(formula, data, na.action=NULL),
                        data[,c(clustervar1,clustervar2)])
      names(data)[c(dim(data)[2]-1):dim(data)[2]] = c(clustervar1,clustervar2)
      data=na.omit(data)
    }
  }

From this the following code gets activated in your case:

if(!is.null(clustervar1)){
    if(is.null(clustervar2)){  
      data = data.frame(model.frame(formula, data, na.action=NULL),data[,clustervar1])
      names(data)[dim(data)[2]] = clustervar1
      data=na.omit(data)
    }
  }

This redefines "data" to be a data.frame build on the model.frame. But the model frame uses names from you formula, so suddenly column 2 is called scale.AGE. and column 3 is called scale.FAMSIZE..

This is a big problem since the function then calls a generalized linear model:

fit = glm(formula, data=data, family = binomial(link = "logit"), x=T, 
        start = start, control = control) 

where it uses your original formula containing scale(AGE) and scale(FAMSIZE), but with the new dataframe with the renamed columns.

So scaling before inputting should work. And indeed any other function will, as Steffen mentioned, cause the same error, since they will produce a similar renaming of columns when model.frame is called.

like image 57
Qwethm Avatar answered Jan 13 '23 11:01

Qwethm


Strange, somehow can't recognize the functions in the formula any more. You can try this, if you remove scale it works fine. Also no other function like log() seems to work.

You could try calculating scale(AGE) before, then you don't need to put it in the formula.

Could look like this:

sample$AGE<-scale(sample$AGE)
sample$FAMSIZE<-scale(sample$FAMSIZE)

form <- as.formula(HYPERTEN~AGE+IMMIGRANT+FAMSIZE+HLTH_INS+SMOKE)
#However, when I add in the code to cluster SE I receive the error: "Error in scale(AGE) : object 'AGE' not found"
logit<-logitmfx(form,data=sample,
                atmean=TRUE,robust=T,clustervar1="PSU", clustervar2=NULL)
like image 21
Steffen Moritz Avatar answered Jan 13 '23 13:01

Steffen Moritz