Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automated formula construction

Tags:

r

modeling

In a recent homework assignment, we were instructed to run 27 linear models, each time adding an additional variable (the goal was to plot the changes in R2 vs. changes in adjusted R2). I found it difficult to algorithmically create formulas like this. The code I ended up using looked like this following (note that the first column in the data frame is the dependent variable, all the rest are prospective independent variables.

 make.formula <- function(howfar) {
  formula <- c()
  for (i in 1:howfar) {
    if (i == 1) {
      formula <- paste(formula, names(d)[i], '~')}
    else if (i == howfar) {
      formula <- paste(formula, names(d)[i], '')
    }
    else {
      formula <- paste(formula, names(d)[i], '+')}
  }
  return(formula)
}

formulas <- lapply(seq(2, length(d)), make.formula)
formulas <- lapply(formulas, as.formula)
fits <- lapply(formulas, lm, data = d)

This works, but seems far from ideal, and my impression is that anything I'm doing with a for-loop in R is probably not being done the best way. Is there an easier way to algorithmically construct formulas for a given data frame?

like image 381
Spencer Boucher Avatar asked Aug 27 '13 19:08

Spencer Boucher


2 Answers

reformulate(), a nifty function for creating formulas from character vectors, might come in handy. Here's an example of what it does:

reformulate(response="Y", termlabels=c("X1", "X2", "X3"))
# Y ~ X1 + X2 + X3

And here's how you might use it in practice. (Note that I here create the formulas inside of the lm() calls. Because formula objects carry with them info about the environment they were created in, I'd be a bit hesitant to create them outside of the lm() call within which you actually want to use them.):

evars <- names(mtcars)[2:5]
ii <- lapply(1:4, seq_len)

lapply(ii, 
       function(X) {
          coef(lm(reformulate(response="mpg", termlabels=evars[X]), data=mtcars))
})
# [[1]]
# (Intercept)         cyl 
#    37.88458    -2.87579 
# 
# [[2]]
# (Intercept)         cyl        disp 
# 34.66099474 -1.58727681 -0.02058363 
# 
# [[3]]
# (Intercept)         cyl        disp          hp 
# 34.18491917 -1.22741994 -0.01883809 -0.01467933 
# 
# [[4]]
# (Intercept)         cyl        disp          hp        drat 
# 23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553 
like image 143
Josh O'Brien Avatar answered Sep 22 '22 13:09

Josh O'Brien


Map can be used to solve that problem:

mydata<-mtcars

dep<-as.list(rep("mpg~",(dim(mydata)[2]-1))) # ldependent variables with ~ 
indep1<- as.list( names(mydata)[-1])
indeno<-as.list(1:(dim(mydata)[2]-1))

myreg<-Map(function(x,y) (lm(as.formula(paste(x,paste(unlist(indep[1:y]),collapse="+"))),data=mtcars))$coefficient,dep,indeno)
   > myreg
[[1]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[2]]
(Intercept)         cyl        disp 
34.66099474 -1.58727681 -0.02058363 

[[3]]
(Intercept)         cyl        disp          hp 
34.18491917 -1.22741994 -0.01883809 -0.01467933 

[[4]]
(Intercept)         cyl        disp          hp        drat 
23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553 

[[5]]
(Intercept)         cyl        disp          hp        drat          wt 
36.00835689 -1.10748650  0.01235733 -0.02401743  0.95220742 -3.67328708 

[[6]]
(Intercept)         cyl        disp          hp        drat          wt        qsec 
26.30735899 -0.81856023  0.01320490 -0.01792993  1.32040573 -4.19083238  0.40146117 

[[7]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs 
25.88354175 -0.85665309  0.01314097 -0.01733070  1.31265550 -4.22434351  0.44873351 -0.27816899 

[[8]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
15.57313068 -0.27859352  0.01471012 -0.02144242  0.81505862 -3.94373934  0.80975689  0.36835866  2.79374984 

[[9]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.83083549 -0.16881263  0.01623358 -0.02424055  0.70590083 -4.03214213  0.86828517  0.36470431  2.55092849 
       gear 
 0.50293618 

[[10]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.30337416 -0.11144048  0.01333524 -0.02148212  0.78711097 -3.71530393  0.82104075  0.31776281  2.52022689 
       gear        carb 
 0.65541302 -0.19941925 
like image 33
Metrics Avatar answered Sep 24 '22 13:09

Metrics