The data.table package is very helpful in terms of speed. But I am having trouble actually using the output from a linear regression. Is there an easy way to get the data.table output to be as pretty/useful as that from the plyr package? Below is an example. Thank you!
library('data.table');
library('plyr');
REG <- data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)), y=rnorm(15), x=rnorm(15), z=rnorm(15));
REG;
ddply(REG, .(ID), function(x) coef(lm(y ~ x + z, data=x)));
REG[, coef(lm(y ~ x + z)), by=ID];
The data.table coefficient estimates are output in a single column whereas the plyr/ddply coefficient estimates are output in multiple and nicely labeled columns.
I know I can run the regression three times with data.table but that seems really inefficient. I could be wrong, though.
REG[, Intercept=coef(lm(y ~ x + z))[1],
x =coef(lm(y ~ x + z))[2],
z =coef(lm(y ~ x + z))[3], by=ID];
Try this:
> REG[, as.list(coef(lm(y ~ x + z))), by=ID];
ID (Intercept) x z
[1,] Frank -0.2928611 0.07215896 1.835106
[2,] Tony 0.9120795 -1.11153056 2.041260
[3,] Ed 1.0498359 5.77131778 -1.253741
I have the nagging feeling that this question was asked less than a week ago, but I don't think I arrived at this approach when I tried it and I don't remember than any answer was this compact.
Oh, there it is .. on r-help. Matthew can comment on the rightfulness of this if he wants. I guess the message is that functions returning lists will not have dimensions dropped. The interesting thing was the using list(coef(lm(...))
did not succeed in the manner we hoped.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With