Below are 4 datasets (I've just created them randomly for the sake of providing a reproducible code). I created a list of these so I could apply "lm" to these multiple datasets at once :
H<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
C<-data.frame(replicate(5,sample(0:100,10,rep=FALSE)))
R<-data.frame(replicate(7,sample(0:30,10,rep=TRUE)))
E<-data.frame(replicate(4,sample(0:40,10,rep=FALSE)))
dsets<-list(H,C,R,E)
models<-lapply(dsets,function(x)lm(X1~.,data=x))
lapply(models,summary)
The variables in each of the datasets are different (in count as well as names. However,if you run the code they will all be x1,x2..and so on). The first column/variable in each will be the response and rest would be the independent variables.
This code works but not on my actual dataset. Since my datasets have actual names for variables, I used the position of the variable instead as below:
dsets<-list(H,C,R,E)
models<lapply(dsets,function(x)lm(x[,1]~.,data=x))
lapply(models,summary)
Using the above, the results are messed up. It also includes the response variable as the independent variable.
Could anyone assist?
EDIT: I realized that x[,1] is calling the whole column and not the column name
models<lapply(dsets,function(x)lm(colnames(x)[1]~.,data=x))
lapply(models,summary)
but this doesn't work either. I get the following error
Error in model.frame.default(formula = colnames(H[1]) ~ ., data = H, drop.unused.levels = TRUE) :
variable lengths differ (found for 'Var1')
models <- lapply(dsets,
function(data){
lm(reformulate(termlabels=".", response=names(data)[1]), data)
})
reformulate
allows you to construct a formula from character
strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With