In a recent homework assignment, we were instructed to run 27 linear models, each time adding an additional variable (the goal was to plot the changes in R2 vs. changes in adjusted R2). I found it difficult to algorithmically create formulas like this. The code I ended up using looked like this following (note that the first column in the data frame is the dependent variable, all the rest are prospective independent variables.
make.formula <- function(howfar) {
formula <- c()
for (i in 1:howfar) {
if (i == 1) {
formula <- paste(formula, names(d)[i], '~')}
else if (i == howfar) {
formula <- paste(formula, names(d)[i], '')
}
else {
formula <- paste(formula, names(d)[i], '+')}
}
return(formula)
}
formulas <- lapply(seq(2, length(d)), make.formula)
formulas <- lapply(formulas, as.formula)
fits <- lapply(formulas, lm, data = d)
This works, but seems far from ideal, and my impression is that anything I'm doing with a for-loop in R is probably not being done the best way. Is there an easier way to algorithmically construct formulas for a given data frame?
reformulate()
, a nifty function for creating formulas from character vectors, might come in handy. Here's an example of what it does:
reformulate(response="Y", termlabels=c("X1", "X2", "X3"))
# Y ~ X1 + X2 + X3
And here's how you might use it in practice. (Note that I here create the formulas inside of the lm()
calls. Because formula
objects carry with them info about the environment they were created in, I'd be a bit hesitant to create them outside of the lm()
call within which you actually want to use them.):
evars <- names(mtcars)[2:5]
ii <- lapply(1:4, seq_len)
lapply(ii,
function(X) {
coef(lm(reformulate(response="mpg", termlabels=evars[X]), data=mtcars))
})
# [[1]]
# (Intercept) cyl
# 37.88458 -2.87579
#
# [[2]]
# (Intercept) cyl disp
# 34.66099474 -1.58727681 -0.02058363
#
# [[3]]
# (Intercept) cyl disp hp
# 34.18491917 -1.22741994 -0.01883809 -0.01467933
#
# [[4]]
# (Intercept) cyl disp hp drat
# 23.98524441 -0.81402201 -0.01389625 -0.02317068 2.15404553
Map
can be used to solve that problem:
mydata<-mtcars
dep<-as.list(rep("mpg~",(dim(mydata)[2]-1))) # ldependent variables with ~
indep1<- as.list( names(mydata)[-1])
indeno<-as.list(1:(dim(mydata)[2]-1))
myreg<-Map(function(x,y) (lm(as.formula(paste(x,paste(unlist(indep[1:y]),collapse="+"))),data=mtcars))$coefficient,dep,indeno)
> myreg
[[1]]
(Intercept) cyl
37.88458 -2.87579
[[2]]
(Intercept) cyl disp
34.66099474 -1.58727681 -0.02058363
[[3]]
(Intercept) cyl disp hp
34.18491917 -1.22741994 -0.01883809 -0.01467933
[[4]]
(Intercept) cyl disp hp drat
23.98524441 -0.81402201 -0.01389625 -0.02317068 2.15404553
[[5]]
(Intercept) cyl disp hp drat wt
36.00835689 -1.10748650 0.01235733 -0.02401743 0.95220742 -3.67328708
[[6]]
(Intercept) cyl disp hp drat wt qsec
26.30735899 -0.81856023 0.01320490 -0.01792993 1.32040573 -4.19083238 0.40146117
[[7]]
(Intercept) cyl disp hp drat wt qsec vs
25.88354175 -0.85665309 0.01314097 -0.01733070 1.31265550 -4.22434351 0.44873351 -0.27816899
[[8]]
(Intercept) cyl disp hp drat wt qsec vs am
15.57313068 -0.27859352 0.01471012 -0.02144242 0.81505862 -3.94373934 0.80975689 0.36835866 2.79374984
[[9]]
(Intercept) cyl disp hp drat wt qsec vs am
12.83083549 -0.16881263 0.01623358 -0.02424055 0.70590083 -4.03214213 0.86828517 0.36470431 2.55092849
gear
0.50293618
[[10]]
(Intercept) cyl disp hp drat wt qsec vs am
12.30337416 -0.11144048 0.01333524 -0.02148212 0.78711097 -3.71530393 0.82104075 0.31776281 2.52022689
gear carb
0.65541302 -0.19941925
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With