I have the following sample data.table
:
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums
, for example. What is the correct way to do this? The following does not work:
dtb[,colSums, by="id"]
This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name
The aggregate() function in R is used to produce summary statistics for one or more variables in a data frame or a data.
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
Use the rowSums() Function of Base R to Calculate the Sum of Selected Columns of a Data Frame. We will create a new column using the data_frame$new_column syntax and assign its value using the rowSums() function. The columns to add will be given directly in the function using the subsetting syntax.
Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter. The by parameter has to be a list .
this is actually what i was looking for and is mentioned in the FAQ:
dtb[,lapply(.SD,mean),by="id"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With