Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table sum and subset

I have a data.table that I am wanting to aggregate

library(data.table)
dt1 <- data.table(year=c("2001","2001","2001","2002","2002","2002","2002"),
                  group=c("a","a","b","a","a","b","b"), 
                  amt=c(20,40,20,35,30,28,19))

I am wanting to sum the amt by year and group and then filter where the summed amt for any given group is greater than 100.

I've got the data.table sum nailed.

dt1[, sum(amt),by=list(year,group)]

   year group V1
1: 2001     a 60
2: 2001     b 20
3: 2002     a 65
4: 2002     b 47

I am having trouble with my final level of filtering.

The end outcome I am looking for is:

   year group V1
1: 2001     a 60
2: 2002     a 65

As a) 60 + 65 > 100 whereas b) 20 + 47 <= 100

Any thoughts on how to achieve this would be great.

I had a look at this data.table sum by group and return row with max value and was wondering whether or not their is an equally eloquent solution to my problem.

like image 326
Dan Avatar asked May 12 '15 02:05

Dan


2 Answers

Single liner in data.table:

dt1[, lapply(.SD,sum), by=.(year,group)][, if (sum(amt) > 100) .SD, by=group]

#   group year amt
#1:     a 2001  60
#2:     a 2002  65
like image 127
thelatemail Avatar answered Sep 28 '22 05:09

thelatemail


You can do:

library(dplyr)
dt1 %>% 
  group_by(group, year) %>% 
  summarise(amt = sum(amt)) %>%
  filter(sum(amt) > 100)

Which gives:

#Source: local data table [2 x 3]
#Groups: group
#
#  year group amt
#1 2001     a  60
#2 2002     a  65
like image 40
Steven Beaupré Avatar answered Sep 28 '22 04:09

Steven Beaupré