I have the following sample data.table:
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the correct way to do this? The following does not work:
dtb[,colSums, by="id"]
This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name
The aggregate() function in R is used to produce summary statistics for one or more variables in a data frame or a data.
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
Use the rowSums() Function of Base R to Calculate the Sum of Selected Columns of a Data Frame. We will create a new column using the data_frame$new_column syntax and assign its value using the rowSums() function. The columns to add will be given directly in the function using the subsetting syntax.
Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter. The by parameter has to be a list .
this is actually what i was looking for and is mentioned in the FAQ:
dtb[,lapply(.SD,mean),by="id"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With