I have a data.table
that I would like to perform group-by operations on, but would like to retain the null variables and use different group-by variable sets.
A toy example:
library(data.table)
set.seed(1)
DT <- data.table(
id = sample(c("US", "Other"), 25, replace = TRUE),
loc = sample(LETTERS[1:5], 25, replace = TRUE),
index = runif(25)
)
I would like to find the sum of index
by all combinations of the key variables (including the null set). The concept is analogous to "grouping sets" in Oracle SQL, here is an example of my current workaround:
rbind(
DT[, list(id = "", loc = "", sindex = sum(index)), by = NULL],
DT[, list(loc = "", sindex = sum(index)), by = "id"],
DT[, list(id = "", sindex = sum(index)), by = "loc"],
DT[, list(sindex = sum(index)), by = c("id", "loc")]
)[order(id, loc)]
id loc sindex
1: 11.54218399
2: A 2.82172063
3: B 0.98639578
4: C 2.89149433
5: D 3.93292900
6: E 0.90964424
7: Other 6.19514146
8: Other A 1.12107080
9: Other B 0.43809711
10: Other C 2.80724742
11: Other D 1.58392886
12: Other E 0.24479728
13: US 5.34704253
14: US A 1.70064983
15: US B 0.54829867
16: US C 0.08424691
17: US D 2.34900015
18: US E 0.66484697
Is there a preferred "data table" way to accomplish this?
SD is a data. table containing the subset of x's data for each group, excluding the group column(s).
data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.
As of this commit, this is now possible with the dev version of data.table
with cube
or groupingsets
:
library("data.table")
# data.table 1.10.5 IN DEVELOPMENT built 2017-08-08 18:31:51 UTC
# The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
# Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
# Release notes, videos and slides: http://r-datatable.com
cube(DT, list(sindex = sum(index)), by = c("id", "loc"))
# id loc sindex
# 1: US B 0.54829867
# 2: US A 1.70064983
# 3: Other B 0.43809711
# 4: Other E 0.24479728
# 5: Other C 2.80724742
# 6: Other A 1.12107080
# 7: US E 0.66484697
# 8: US D 2.34900015
# 9: Other D 1.58392886
# 10: US C 0.08424691
# 11: NA B 0.98639578
# 12: NA A 2.82172063
# 13: NA E 0.90964424
# 14: NA C 2.89149433
# 15: NA D 3.93292900
# 16: US NA 5.34704253
# 17: Other NA 6.19514146
# 18: NA NA 11.54218399
groupingsets(DT, j = list(sindex = sum(index)), by = c("id", "loc"), sets = list(character(), "id", "loc", c("id", "loc")))
# id loc sindex
# 1: NA NA 11.54218399
# 2: US NA 5.34704253
# 3: Other NA 6.19514146
# 4: NA B 0.98639578
# 5: NA A 2.82172063
# 6: NA E 0.90964424
# 7: NA C 2.89149433
# 8: NA D 3.93292900
# 9: US B 0.54829867
# 10: US A 1.70064983
# 11: Other B 0.43809711
# 12: Other E 0.24479728
# 13: Other C 2.80724742
# 14: Other A 1.12107080
# 15: US E 0.66484697
# 16: US D 2.34900015
# 17: Other D 1.58392886
# 18: US C 0.08424691
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With