I have a data.table that I am wanting to aggregate
library(data.table)
dt1 <- data.table(year=c("2001","2001","2001","2002","2002","2002","2002"),
group=c("a","a","b","a","a","b","b"),
amt=c(20,40,20,35,30,28,19))
I am wanting to sum
the amt by year and group and then filter where the summed amt for any given group is greater than 100.
I've got the data.table sum nailed.
dt1[, sum(amt),by=list(year,group)]
year group V1
1: 2001 a 60
2: 2001 b 20
3: 2002 a 65
4: 2002 b 47
I am having trouble with my final level of filtering.
The end outcome I am looking for is:
year group V1
1: 2001 a 60
2: 2002 a 65
As a) 60 + 65 > 100
whereas b) 20 + 47 <= 100
Any thoughts on how to achieve this would be great.
I had a look at this data.table sum by group and return row with max value and was wondering whether or not their is an equally eloquent solution to my problem.
Single liner in data.table
:
dt1[, lapply(.SD,sum), by=.(year,group)][, if (sum(amt) > 100) .SD, by=group]
# group year amt
#1: a 2001 60
#2: a 2002 65
You can do:
library(dplyr)
dt1 %>%
group_by(group, year) %>%
summarise(amt = sum(amt)) %>%
filter(sum(amt) > 100)
Which gives:
#Source: local data table [2 x 3]
#Groups: group
#
# year group amt
#1 2001 a 60
#2 2002 a 65
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With