I've got a data frame containing a vector of x values, a vector of y values, and a vector of IDs:
x <- rep(0:3, 3)
y <- runif(12)
ID <- c(rep("a", 4), rep("b", 4), rep("c", 4))
df <- data.frame(ID=ID, x=x, y=y)
I'd like to create a separate lm for the subset of x's and y's sharing the same ID. The following code gets the job done:
a.lm <- lm(x~y, data=subset(df, ID=="a"))
b.lm <- lm(x~y, data=subset(df, ID=="b"))
c.lm <- lm(x~y, data=subset(df, ID=="c"))
Except that this is very brittle (future data sets might have different IDs) and un-vectorized. I'd also like to store all the lms in a single data structure. There must be an elegant way to do this, but I can't find it. Any help?
If you wanted to get the subset of a data. frame (DataFrame) Rows & Columns in R, either use the subset() function , filter() from dplyr package or R base square bracket notation df[] . subset() is a generic R function that is used to get the rows and columns (In R terms observations & variables) from the data frame.
The subset parameter in lm() and other model fitting functions takes as its argument a logical vector the length of the dataframe, evaluated in the environment of the dataframe.
A matrix is subset with two arguments within single brackets, [] , and separated by a comma. The first argument specifies the rows, and the second the columns.
The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions.
Using base
functions, you can split
your original dataframe and use lapply
on that:
lapply(split(df,df$ID),function(d) lm(x~y,d))
$a
Call:
lm(formula = x ~ y, data = d)
Coefficients:
(Intercept) y
-0.2334 2.8813
$b
Call:
lm(formula = x ~ y, data = d)
Coefficients:
(Intercept) y
0.7558 1.8279
$c
Call:
lm(formula = x ~ y, data = d)
Coefficients:
(Intercept) y
3.451 -7.628
How about
library(nlme) ## OR library(lme4)
lmList(x~y|ID,data=d)
?
Use some of the magic in the plyr
package. The function dlply
takes a data.frame
, splits it, applies a function to each element, and combines it into a list
. This is perfect for your application.
library(plyr)
#fitList <- dlply(df, .(ID), function(dat)lm(x~y, data=dat))
fitList <- dlply(df, .(ID), lm, formula=x~y) # Edit
This creates a list with a model for each subset of ID:
str(fitList, max.level=1)
List of 3
$ a:List of 12
..- attr(*, "class")= chr "lm"
$ b:List of 12
..- attr(*, "class")= chr "lm"
$ c:List of 12
..- attr(*, "class")= chr "lm"
- attr(*, "split_type")= chr "data.frame"
- attr(*, "split_labels")='data.frame': 3 obs. of 1 variable:
This means you can subset the list and work with that. For example, to get the coefficients for your lm
model where ID=="a"
:
> coef(fitList$a)
(Intercept) y
3.071854 -3.440928
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With