I have a data.table
with a large number of rows. I want to group the data table by one particular column, and I want to apply the same aggregation function to all the other columns. What is the appropriate way of doing that?
Here is some sample code to set up a data table that looks similar to what I have.
my.table.tmp <- matrix(runif(5000*95), nrow=5000)
my.table <- data.table(my.table.tmp)
my.table[, gbc:=rep(c('A', 'B', 'C', 'D', 'E'), 1000)]
I want to group the table by the factor column gbc
, and I want that all the remaining 95 columns should be aggregated by a function, let's say mean
.
I see that
my.table[, lapply(.SD, mean), by=gbc]
gives me a table with the correct dimensions, but I am not sure if this is doing the right thing. If it is doing the right thing, can someone help me by breaking down what's happening here?
Your description sounds correct .SD
is just all the subsetted columns for each by=
group, and since a data.frame/data.table
is just a list
stuck together as columns, lapply
will loop over each column applying the myfunction
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With