Using data.table package, is it possible to summarise data preserving combinations of variables that do not appear in the input?
With plyr package I know how to do this with the .drop argument, for example:
require(plyr)
df <- data.frame(categories = c(rep("A",3), rep("B",3), rep("C",3)), groups = c(rep(c("X", "Y"),4), "Z"), values = rep(1, 9))
df1 <- ddply(df, c("categories","groups"), .drop = F, summarise, sum = sum(values))
output:
categories groups sum
1 A X 2
2 A Y 1
3 A Z 0
4 B X 1
5 B Y 2
6 B Z 0
7 C X 1
8 C Y 1
9 C Z 1
In this case I preserve all groups/categories combinations even if its sum is 0.
Great question. Here are two ways. They both use by-without-by.
DT = as.data.table(df)
setkey(DT,categories,groups)
DT[CJ(unique(categories),unique(groups)), sum(values,na.rm=TRUE)]
categories groups V1
1: A X 2
2: A Y 1
3: A Z 0
4: B X 1
5: B Y 2
6: B Z 0
7: C X 1
8: C Y 1
9: C Z 1
where CJ
stands for Cross Join, see ?CJ
. by-without-by just means that j
gets executed on each group that each row of i
joins to.
Admitedly it looks tricky on first sight. The idea is that if you have a known subset of groups, this syntax is faster than grouping everything and then selecting just the results from that you need. But in this case you'd like everything anyway so there's not much advantage, other than being able to lookup groups that don't exist in the data (which you can't do with by
).
Another way is to by
first as normal, then join the CJ()
result to that :
DT[,sum(values),keyby='categories,groups'][CJ(unique(categories),unique(groups))]
categories groups V1
1: A X 2
2: A Y 1
3: A Z NA
4: B X 1
5: B Y 2
6: B Z NA
7: C X 1
8: C Y 1
9: C Z 1
but then you get NA instead of the desired 0. Those could be replaced using set()
if need be. The second way might be faster because the two unique
calls are given much smaller input.
Both methods can be wrapped up into small helper functions if you do this a lot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With