Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table vs plyr regression output

Tags:

r

data.table

The data.table package is very helpful in terms of speed. But I am having trouble actually using the output from a linear regression. Is there an easy way to get the data.table output to be as pretty/useful as that from the plyr package? Below is an example. Thank you!

library('data.table');
library('plyr');

REG <- data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)), y=rnorm(15), x=rnorm(15), z=rnorm(15));
REG;

ddply(REG, .(ID), function(x) coef(lm(y ~ x + z, data=x)));

REG[, coef(lm(y ~ x + z)), by=ID];

The data.table coefficient estimates are output in a single column whereas the plyr/ddply coefficient estimates are output in multiple and nicely labeled columns.

I know I can run the regression three times with data.table but that seems really inefficient. I could be wrong, though.

REG[, Intercept=coef(lm(y ~ x + z))[1],
      x        =coef(lm(y ~ x + z))[2],
      z        =coef(lm(y ~ x + z))[3], by=ID];
like image 267
user1491868 Avatar asked Jun 29 '12 18:06

user1491868


1 Answers

Try this:

> REG[, as.list(coef(lm(y ~ x + z))), by=ID];
        ID (Intercept)           x         z
[1,] Frank  -0.2928611  0.07215896  1.835106
[2,]  Tony   0.9120795 -1.11153056  2.041260
[3,]    Ed   1.0498359  5.77131778 -1.253741

I have the nagging feeling that this question was asked less than a week ago, but I don't think I arrived at this approach when I tried it and I don't remember than any answer was this compact.

Oh, there it is .. on r-help. Matthew can comment on the rightfulness of this if he wants. I guess the message is that functions returning lists will not have dimensions dropped. The interesting thing was the using list(coef(lm(...)) did not succeed in the manner we hoped.

like image 134
IRTFM Avatar answered Nov 10 '22 19:11

IRTFM