I have a df that basically looks like this:
Id A B C total
3 5 0 1 6
3 4 3 4 11
3 2 1 2 5
4 5 4 3 12
4 3 2 4 9
4 1 1 1 3
I want to collapse the rows by Id and get:
Id A B C total
3 11 4 7 22
4 9 7 8 24
I was able to do so for one column with:
df.grouped<- aggregate(df$A~Id, data=df, FUN="sum")
I have many columns (A-Z), so I need some kind of loop. I tried:
df.grouped<- aggregate(df[4:51]~Id, data=df, FUN="sum")
names(df.grouped)<-paste(names(df)[4:51])
But got:
Error in model.frame.default(formula = df[4:51] ~ Id, data = df) :
invalid type (list) for variable 'df[4:51]'
As you can see, I also want the names in df.grouped to be the same as in df.
Any ideas will be very helpful
Thanks
We can use the formula method of aggregate
. By specifying .
on the LHS
of ~
, we select all the columns except the 'Id' column.
aggregate(.~Id, df, sum)
# Id A B C total
#1 3 11 4 7 22
#2 4 9 7 8 24
Or we can also specify the columns without using the formula method
aggregate(df[2:ncol(df)],df['Id'], FUN=sum)
# Id A B C total
#1 3 11 4 7 22
#2 4 9 7 8 24
Other options include dplyr
and data.table
.
Using dplyr
, we group by 'Id' and get the sum
of all columns with summarise_each
.
library(dplyr)
df %>%
group_by(Id) %>%
summarise_each(funs(sum))
Or with data.table
, we convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'Id', we loop (lapply(..
) through the Subset of Data.table (.SD
) and get the sum
.
library(data.table)
setDT(df)[, lapply(.SD, sum), by = Id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With