Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying a function to all columns of a data.table together with a group-by

I have a data.table with a large number of rows. I want to group the data table by one particular column, and I want to apply the same aggregation function to all the other columns. What is the appropriate way of doing that?

Here is some sample code to set up a data table that looks similar to what I have.

my.table.tmp <- matrix(runif(5000*95), nrow=5000)
my.table <- data.table(my.table.tmp)
my.table[, gbc:=rep(c('A', 'B', 'C', 'D', 'E'), 1000)]

I want to group the table by the factor column gbc, and I want that all the remaining 95 columns should be aggregated by a function, let's say mean.

I see that

my.table[, lapply(.SD, mean), by=gbc]

gives me a table with the correct dimensions, but I am not sure if this is doing the right thing. If it is doing the right thing, can someone help me by breaking down what's happening here?

like image 511
CCG Avatar asked Nov 08 '22 19:11

CCG


1 Answers

Your description sounds correct .SD is just all the subsetted columns for each by= group, and since a data.frame/data.table is just a list stuck together as columns, lapply will loop over each column applying the myfunction

like image 100
thelatemail Avatar answered Nov 15 '22 07:11

thelatemail