Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

applying lm to multiple datasets

Below are 4 datasets (I've just created them randomly for the sake of providing a reproducible code). I created a list of these so I could apply "lm" to these multiple datasets at once :

H<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))   
C<-data.frame(replicate(5,sample(0:100,10,rep=FALSE)))
R<-data.frame(replicate(7,sample(0:30,10,rep=TRUE)))
E<-data.frame(replicate(4,sample(0:40,10,rep=FALSE)))

dsets<-list(H,C,R,E)
models<-lapply(dsets,function(x)lm(X1~.,data=x))
lapply(models,summary)

The variables in each of the datasets are different (in count as well as names. However,if you run the code they will all be x1,x2..and so on). The first column/variable in each will be the response and rest would be the independent variables.

This code works but not on my actual dataset. Since my datasets have actual names for variables, I used the position of the variable instead as below:

   dsets<-list(H,C,R,E)
   models<lapply(dsets,function(x)lm(x[,1]~.,data=x))
   lapply(models,summary)

Using the above, the results are messed up. It also includes the response variable as the independent variable.

Could anyone assist?

EDIT: I realized that x[,1] is calling the whole column and not the column name

   models<lapply(dsets,function(x)lm(colnames(x)[1]~.,data=x))
   lapply(models,summary)

but this doesn't work either. I get the following error

Error in model.frame.default(formula = colnames(H[1]) ~ ., data = H, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'Var1')
like image 399
oivemaria Avatar asked Feb 10 '15 00:02

oivemaria


Video Answer


1 Answers

models <- lapply(dsets, 
             function(data){
               lm(reformulate(termlabels=".", response=names(data)[1]), data)
             })

reformulate allows you to construct a formula from character strings.

like image 121
stanekam Avatar answered Sep 28 '22 13:09

stanekam