Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ddply with lm() function

Tags:

dataframe

r

plyr

How can I use ddply function for linear model?

x1 <- c(1:10, 1:10)
x2 <- c(1:5, 1:5, 1:5, 1:5)
x3 <- c(rep(1,5), rep(2,5), rep(1,5), rep(2,5))

set.seed(123)
y <- rnorm(20, 10, 3)
mydf <- data.frame(x1, x2, x3, y)

require(plyr)
ddply(mydf, mydf$x3, .fun = lm(mydf$y ~ mydf$X1 + mydf$x2)) 

This generates this error:

Error in model.frame.default(formula = mydf$y ~ mydf$X1 + mydf$x2, drop.unused.levels = TRUE) : invalid type (NULL) for variable 'mydf$X1'

Appreciate your help.

like image 314
jon Avatar asked Sep 23 '11 01:09

jon


1 Answers

Here is what you need to do.

mods = dlply(mydf, .(x3), lm, formula = y ~ x1 + x2)

mods is a list of two objects containing the regression results. you can extract what you need from mods. for example, if you want to extract the coefficients, you could write

coefs = ldply(mods, coef)

This gives you

  x3 (Intercept)         x1 x2
1  1    11.71015 -0.3193146 NA
2  2    21.83969 -1.4677690 NA

EDIT. If you want ANOVA, then you can just do

ldply(mods, anova)

  x3 Df    Sum Sq   Mean Sq   F value     Pr(>F)
1  1  1  2.039237  2.039237 0.4450663 0.52345980
2  1  8 36.654982  4.581873        NA         NA
3  2  1 43.086916 43.086916 4.4273907 0.06849533
4  2  8 77.855187  9.731898        NA         NA
like image 91
Ramnath Avatar answered Oct 24 '22 23:10

Ramnath