Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aggregating multiple columns in data.table

I have the following sample data.table:

dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))

I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the correct way to do this? The following does not work:

 dtb[,colSums, by="id"]

This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name

like image 463
Alex Avatar asked Jul 27 '12 20:07

Alex


People also ask

Which function is used to aggregate values from multiple columns in to one?

The aggregate() function in R is used to produce summary statistics for one or more variables in a data frame or a data.

How do you aggregate a table in R?

In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .

How do I add values from multiple columns in R?

Use the rowSums() Function of Base R to Calculate the Sum of Selected Columns of a Data Frame. We will create a new column using the data_frame$new_column syntax and assign its value using the rowSums() function. The columns to add will be given directly in the function using the subsetting syntax.

What does aggregate mean in Rstudio?

Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter. The by parameter has to be a list .


1 Answers

this is actually what i was looking for and is mentioned in the FAQ:

dtb[,lapply(.SD,mean),by="id"]
like image 143
Alex Avatar answered Oct 22 '22 14:10

Alex