Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate multiple columns at once [duplicate]

Tags:

r

aggregate

I have a data-frame likeso:

x <-
id1 id2    val1  val2 val3 val4
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8

I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time.

How do i do this?

This is what i currently have but it works just for 1 column:

agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean)
names(agg)[3] <- c("val1")  # Rename the column

Also, how do i rename the columns which are outputted as means in the same statement given above

like image 298
Rookie Avatar asked Dec 30 '15 05:12

Rookie


People also ask

How do you aggregate on multiple columns?

We have to use the + operator to group multiple columns. In this example, We are going to group names and subjects to get sum of marks.

How do I group multiple columns in R?

Group By Multiple Columns in R using dplyrUse group_by() function in R to group the rows in DataFrame by multiple columns (two or more), to use this function, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.


2 Answers

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
    group_by(id1, id2) %>%
    summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)] 

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))
like image 53
akrun Avatar answered Oct 20 '22 07:10

akrun


You could try:

agg <- aggregate(list(x$val1, x$val2, x$val3, x$val4), by = list(x$id1, x$id2), mean)
like image 43
Filipe Mencarini Avatar answered Oct 20 '22 09:10

Filipe Mencarini