Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating rows for multiple columns in R [duplicate]

Tags:

r

I have a df that basically looks like this:

Id  A    B    C total
3   5    0    1  6
3   4    3    4   11
3   2    1    2    5
4   5    4    3   12
4   3    2    4    9
4   1    1    1    3

I want to collapse the rows by Id and get:

Id   A    B    C    total
3    11   4    7     22
4    9    7    8   24

I was able to do so for one column with:

df.grouped<- aggregate(df$A~Id, data=df, FUN="sum")

I have many columns (A-Z), so I need some kind of loop. I tried:

df.grouped<- aggregate(df[4:51]~Id, data=df, FUN="sum")
names(df.grouped)<-paste(names(df)[4:51])

But got:

Error in model.frame.default(formula = df[4:51] ~ Id, data = df) : 
invalid type (list) for variable 'df[4:51]'

As you can see, I also want the names in df.grouped to be the same as in df.

Any ideas will be very helpful

Thanks

like image 384
user3315563 Avatar asked Mar 14 '23 14:03

user3315563


1 Answers

We can use the formula method of aggregate. By specifying . on the LHS of ~, we select all the columns except the 'Id' column.

aggregate(.~Id, df, sum)
#   Id  A B C total
#1  3 11 4 7    22
#2  4  9 7 8    24

Or we can also specify the columns without using the formula method

aggregate(df[2:ncol(df)],df['Id'], FUN=sum)
#  Id  A B C total
#1  3 11 4 7    22
#2  4  9 7 8    24

Other options include dplyr and data.table.

Using dplyr, we group by 'Id' and get the sum of all columns with summarise_each.

library(dplyr)
df %>%
  group_by(Id) %>%
  summarise_each(funs(sum))

Or with data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Id', we loop (lapply(..) through the Subset of Data.table (.SD) and get the sum.

library(data.table)
setDT(df)[, lapply(.SD, sum), by = Id]
like image 75
akrun Avatar answered Mar 17 '23 15:03

akrun