Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regression by group

Just a very quick question, I want to run the regression using MASS. The dependent variable are val1, val2, val3 respectively and independent variables are a, b, c, d.

Just look at the fake data.

library(data.table)
library(MASS)
test <- data.table(val1 = 1:10, val2 = 11:20, val3 = 21:30, a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))
summary1 <- glm.nb(val1 ~ a + b + c + d, data = test)
summary2 <- glm.nb(val2 ~ a + b + c + d, data = test)
summary3 <- glm.nb(val3 ~ a + b + c + d, data = test)

I think the code is ugly. I tried this

for (i in c("val1", "val2", "val3")){
paste("sum_", c("val1", "val2", "val3"), sep = "") <- glm.nb(i ~ a + b + c + d, data = simple)
}

But it didn't work. Any suggestions about the improvements? In the original data, there're about 26 independent variables, and I think it will be more ugly if the code is like this sum1 <- glm.nb(val3 ~ a + b + c + d + e + f+ g + h + i + j + k + l, data = test)

I know the following code might be helpful, but I don't know how to use them...:(

diff <- setdiff(colnames(test),c('val1','val2','val3'))

Also, I wonder whether lapply function can achieve this within data.table?

Thanks a lot!

like image 423
Bigchao Avatar asked Feb 14 '23 02:02

Bigchao


2 Answers

Better to put your data in the long format :

library(plyr)
library(reshape2)
xx <- melt(test,measure.vars=paste0('val',1:3))
ddply(xx,.(variable),function(x){
  coef(glm.nb(value~.,data=subset(x,select=-variable)))
})

 variable (Intercept)            a            b           c          d
1     val1    1.583602 -0.045909060 -0.018189342 0.026293033 0.29708648
2     val2    2.704601 -0.014641683 -0.003836401 0.006711503 0.10445377
3     val3    3.217729 -0.008925782 -0.001863267 0.003475509 0.06292286

If you want all the model not just the coefficients:

dlply(xx,.(variable),function(x){
  glm.nb(value~.,data=subset(x,select=-variable))
})
like image 91
agstudy Avatar answered Feb 16 '23 15:02

agstudy


Using your loop approach I would simply store all my models in a list like so

results <- list()

for (i in c("val1", "val2", "val3")){
  frml <- paste(i, "~ a + b + c + d")
  frml <- as.formula(frml)

  results[[i]] <- glm.nb(frml, data = simple)
}

And then access the models in the list by looking at results$val1 etc.

like image 42
Stuples Avatar answered Feb 16 '23 17:02

Stuples