I've got a data frame containing a vector of x values, a vector of y values, and a vector of IDs: <pre class="prettyprint"><code>x <- rep(0:3, 3) y <- runif(12) ID <- c(rep("a", 4), rep("b", 4), rep("c", 4)) df <- data.frame(ID=ID, x=x, y=y) </code></pre> I'd like to create a separate lm for the subset of x's and y's sharing the same ID. The following code gets the job done: <pre class="prettyprint"><code>a.lm <- lm(x~y, data=subset(df, ID=="a")) b.lm <- lm(x~y, data=subset(df, ID=="b")) c.lm <- lm(x~y, data=subset(df, ID=="c")) </code></pre> Except that this is very brittle (future data sets might have different IDs) and un-vectorized. I'd also like to store all the lms in a single data structure. There must be an elegant way to do this, but I can't find it. Any help?

How about <pre class="prettyprint"><code>library(nlme) ## OR library(lme4) lmList(x~y|ID,data=d) </code></pre> ?

Use some of the magic in the <code>plyr</code> package. The function <code>dlply</code> takes a <code>data.frame</code>, splits it, applies a function to each element, and combines it into a <code>list</code>. This is perfect for your application. <pre class="prettyprint"><code>library(plyr) #fitList <- dlply(df, .(ID), function(dat)lm(x~y, data=dat)) fitList <- dlply(df, .(ID), lm, formula=x~y) # Edit </code></pre> This creates a list with a model for each subset of ID: <pre class="prettyprint"><code>str(fitList, max.level=1) List of 3 $ a:List of 12 ..- attr(*, "class")= chr "lm" $ b:List of 12 ..- attr(*, "class")= chr "lm" $ c:List of 12 ..- attr(*, "class")= chr "lm" - attr(*, "split_type")= chr "data.frame" - attr(*, "split_labels")='data.frame': 3 obs. of 1 variable: </code></pre> This means you can subset the list and work with that. For example, to get the coefficients for your <code>lm</code> model where <code>ID=="a"</code>: <pre class="prettyprint"><code>> coef(fitList$a) (Intercept) y 3.071854 -3.440928 </code></pre>

Apply lm to subset of data frame defined by a third column of the frame

Tags:

dataframe

r

vectorization

I've got a data frame containing a vector of x values, a vector of y values, and a vector of IDs:

x <- rep(0:3, 3)
y <- runif(12)
ID <- c(rep("a", 4), rep("b", 4), rep("c", 4))
df <- data.frame(ID=ID, x=x, y=y)

I'd like to create a separate lm for the subset of x's and y's sharing the same ID. The following code gets the job done:

a.lm <- lm(x~y, data=subset(df, ID=="a"))
b.lm <- lm(x~y, data=subset(df, ID=="b"))
c.lm <- lm(x~y, data=subset(df, ID=="c"))

Except that this is very brittle (future data sets might have different IDs) and un-vectorized. I'd also like to store all the lms in a single data structure. There must be an elegant way to do this, but I can't find it. Any help?

651

asked Sep 14 '11 10:09

Drew Steen

3 Answers

Using base functions, you can split your original dataframe and use lapply on that:

lapply(split(df,df$ID),function(d) lm(x~y,d))
$a

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
    -0.2334       2.8813  


$b

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
     0.7558       1.8279  


$c

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
      3.451       -7.628

answered Oct 23 '22 13:10

James

How about

library(nlme) ## OR library(lme4)
lmList(x~y|ID,data=d)

answered Oct 23 '22 13:10

Ben Bolker

Use some of the magic in the plyr package. The function dlply takes a data.frame, splits it, applies a function to each element, and combines it into a list. This is perfect for your application.

library(plyr)
#fitList <- dlply(df, .(ID), function(dat)lm(x~y, data=dat))
fitList <- dlply(df, .(ID), lm, formula=x~y) # Edit

This creates a list with a model for each subset of ID:

str(fitList, max.level=1)

List of 3
 $ a:List of 12
  ..- attr(*, "class")= chr "lm"
 $ b:List of 12
  ..- attr(*, "class")= chr "lm"
 $ c:List of 12
  ..- attr(*, "class")= chr "lm"
 - attr(*, "split_type")= chr "data.frame"
 - attr(*, "split_labels")='data.frame':    3 obs. of  1 variable:

This means you can subset the list and work with that. For example, to get the coefficients for your lm model where ID=="a":

> coef(fitList$a)
(Intercept)           y 
   3.071854   -3.440928

answered Oct 23 '22 13:10

Andrie

Related questions
                            
                                Overall Title for Plotting Window
                            
                                How to partition a set of values (vector) in R
                            
                                Easily input a correlation matrix in R
                            
                                cut() - include lowest values
                            
                                Splitting a number in R
                            
                                More efficient strategy for which() or match()
                            
                                get filename from url path in R
                            
                                Efficient use of functions on long data.frames in R
                            
                                Add new row to matrix one by one
                            
                                matching and counting strings (k-mer of DNA) in R
                            
                                Replace a set of pattern matches with corresponding replacement strings in R
                            
                                R get rows based on multiple conditions - use dplyr and reshape2
                            
                                Stratified sampling on factor
                            
                                Cannot install devtools package after upgrading R
                            
                                How to remove first N rows in a data set in R? [duplicate]
                            
                                Passing reactive values to conditionalPanel condition
                            
                                Distinct enclosing environment, function environment, etc. in R
                            
                                Plotting a 95% confidence interval for a lm object
                            
                                Is there a base R function to dynamically order data.frame columns similar to dplyr everything()?
                            
                                R: turning list items into objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apply lm to subset of data frame defined by a third column of the frame

Tags:

dataframe

r

vectorization

Drew Steen

People also ask

3 Answers

James

Ben Bolker

Andrie

Recent Activity

Donate For Us