I've got a data.table
in R:
library(data.table)
set.seed(1)
DT = data.table(
group=sample(letters[1:2],100,replace=TRUE),
year=sample(2010:2012,100,replace=TRUE),
v=runif(100))
Aggregating this data into a summary table by group and year is simple and elegant:
table <- DT[,mean(v),by='group, year']
However, aggregating this data into a summary table, including subtotals and grand totals, is a little more difficult, and a lot less elegant:
library(plyr)
yearTot <- DT[,list(mean(v),year='Total'),by='group']
groupTot <- DT[,list(mean(v),group='Total'),by='year']
Tot <- DT[,list(mean(v), year='Total', group='Total')]
table <- rbind.fill(table,yearTot,groupTot,Tot)
table$group[table$group==1] <- 'Total'
table$year[table$year==1] <- 'Total'
This yields:
table[order(table$group, table$year), ]
Is there a simple way to specify subtotals and grand totals with data.table, such as the margins=TRUE
command for plyr? I would prefer to use data.table over plyr on my dataset, as it is a very large dataset that I already have in the data.table format.
A grand total is a total of all calculations on a report, per metric. A subtotal is a total of a specific subset of metric data, totaled at a level you select. Common subtotal functions include sum, count, minimum, maximum, average, mean, and median.
In order to calculate a subtotal in SQL query, we can use the ROLLUP extension of the GROUP BY statement. The ROLLUP extension allows us to generate hierarchical subtotal rows according to its input columns and it also adds a grand total row to the result set.
In recent devel data.table you can use new feature called "grouping sets" to produce sub totals:
library(data.table)
set.seed(1)
DT = data.table(
group=sample(letters[1:2],100,replace=TRUE),
year=sample(2010:2012,100,replace=TRUE),
v=runif(100))
cube(DT, mean(v), by=c("group","year"))
# group year V1
# 1: a 2011 0.4176346
# 2: b 2010 0.5231845
# 3: b 2012 0.4306871
# 4: b 2011 0.4997119
# 5: a 2012 0.4227796
# 6: a 2010 0.2926945
# 7: NA 2011 0.4463616
# 8: NA 2010 0.4278093
# 9: NA 2012 0.4271160
#10: a NA 0.3901875
#11: b NA 0.4835788
#12: NA NA 0.4350153
cube(DT, mean(v), by=c("group","year"), id=TRUE)
# grouping group year V1
# 1: 0 a 2011 0.4176346
# 2: 0 b 2010 0.5231845
# 3: 0 b 2012 0.4306871
# 4: 0 b 2011 0.4997119
# 5: 0 a 2012 0.4227796
# 6: 0 a 2010 0.2926945
# 7: 2 NA 2011 0.4463616
# 8: 2 NA 2010 0.4278093
# 9: 2 NA 2012 0.4271160
#10: 1 a NA 0.3901875
#11: 1 b NA 0.4835788
#12: 3 NA NA 0.4350153
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With