Automated formula construction

Question

In a recent homework assignment, we were instructed to run 27 linear models, each time adding an additional variable (the goal was to plot the changes in R² vs. changes in adjusted R²). I found it difficult to algorithmically create formulas like this. The code I ended up using looked like this following (note that the first column in the data frame is the dependent variable, all the rest are prospective independent variables.

 make.formula <- function(howfar) {
  formula <- c()
  for (i in 1:howfar) {
    if (i == 1) {
      formula <- paste(formula, names(d)[i], '~')}
    else if (i == howfar) {
      formula <- paste(formula, names(d)[i], '')
    }
    else {
      formula <- paste(formula, names(d)[i], '+')}
  }
  return(formula)
}

formulas <- lapply(seq(2, length(d)), make.formula)
formulas <- lapply(formulas, as.formula)
fits <- lapply(formulas, lm, data = d)

This works, but seems far from ideal, and my impression is that anything I'm doing with a for-loop in R is probably not being done the best way. Is there an easier way to algorithmically construct formulas for a given data frame?

Josh O'Brien · Accepted Answer

reformulate(), a nifty function for creating formulas from character vectors, might come in handy. Here's an example of what it does:

reformulate(response="Y", termlabels=c("X1", "X2", "X3"))
# Y ~ X1 + X2 + X3

And here's how you might use it in practice. (Note that I here create the formulas inside of the lm() calls. Because formula objects carry with them info about the environment they were created in, I'd be a bit hesitant to create them outside of the lm() call within which you actually want to use them.):

evars <- names(mtcars)[2:5]
ii <- lapply(1:4, seq_len)

lapply(ii, 
       function(X) {
          coef(lm(reformulate(response="mpg", termlabels=evars[X]), data=mtcars))
})
# [[1]]
# (Intercept)         cyl 
#    37.88458    -2.87579 
# 
# [[2]]
# (Intercept)         cyl        disp 
# 34.66099474 -1.58727681 -0.02058363 
# 
# [[3]]
# (Intercept)         cyl        disp          hp 
# 34.18491917 -1.22741994 -0.01883809 -0.01467933 
# 
# [[4]]
# (Intercept)         cyl        disp          hp        drat 
# 23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553

Metrics · Answer

Map can be used to solve that problem:

mydata<-mtcars

dep<-as.list(rep("mpg~",(dim(mydata)[2]-1))) # ldependent variables with ~ 
indep1<- as.list( names(mydata)[-1])
indeno<-as.list(1:(dim(mydata)[2]-1))

myreg<-Map(function(x,y) (lm(as.formula(paste(x,paste(unlist(indep[1:y]),collapse="+"))),data=mtcars))$coefficient,dep,indeno)
   > myreg
[[1]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[2]]
(Intercept)         cyl        disp 
34.66099474 -1.58727681 -0.02058363 

[[3]]
(Intercept)         cyl        disp          hp 
34.18491917 -1.22741994 -0.01883809 -0.01467933 

[[4]]
(Intercept)         cyl        disp          hp        drat 
23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553 

[[5]]
(Intercept)         cyl        disp          hp        drat          wt 
36.00835689 -1.10748650  0.01235733 -0.02401743  0.95220742 -3.67328708 

[[6]]
(Intercept)         cyl        disp          hp        drat          wt        qsec 
26.30735899 -0.81856023  0.01320490 -0.01792993  1.32040573 -4.19083238  0.40146117 

[[7]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs 
25.88354175 -0.85665309  0.01314097 -0.01733070  1.31265550 -4.22434351  0.44873351 -0.27816899 

[[8]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
15.57313068 -0.27859352  0.01471012 -0.02144242  0.81505862 -3.94373934  0.80975689  0.36835866  2.79374984 

[[9]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.83083549 -0.16881263  0.01623358 -0.02424055  0.70590083 -4.03214213  0.86828517  0.36470431  2.55092849 
       gear 
 0.50293618 

[[10]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.30337416 -0.11144048  0.01333524 -0.02148212  0.78711097 -3.71530393  0.82104075  0.31776281  2.52022689 
       gear        carb 
 0.65541302 -0.19941925

Automated formula construction

Tags:

r

modeling

Spencer Boucher

2 Answers

Josh O'Brien

Metrics

Recent Activity

Donate For Us

Automated formula construction

Tags:

r

modeling

Spencer Boucher

2 Answers

Josh O'Brien

Metrics

Related questions

Recent Activity

Donate For Us