Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate all columns of data.table, without having to reference them by name

I'd like to do the equivalent of the following, but with data.table's "by":

dt <- data.table(V1=rnorm(100), V2=rnorm(100), V3=rnorm(100), ...
                 group=rbinom(100,2,.5))
dt.agg <- aggregate(dt, by=list(dt$group), FUN=mean)

I know that I could do this:

dt.agg <- dt[, list(V1=mean(V1), V2=mean(V2), V3=mean(V3)), by=group]

But for the case I'm considering I have 100 or so columns V1-V100 (and I always want to aggregate all of them by a single factor, as in aggregate above) so the data.table solution I've got above isn't feasible.

like image 335
stackoverflax Avatar asked Aug 06 '13 21:08

stackoverflax


1 Answers

dt[, lapply(.SD, mean), by=group]

To specifiy columns:

dt[,...,by=group, .SDcols=c("V1", "V2", "V3", ...)]
dt[,...,by=group, .SDcols=names(dt)[1:100]]
like image 187
Señor O Avatar answered Oct 20 '22 00:10

Señor O