Aggregating sub totals and grand totals with data.table

Tags:

I've got a data.table in R:

library(data.table)
set.seed(1)
DT = data.table(
  group=sample(letters[1:2],100,replace=TRUE), 
  year=sample(2010:2012,100,replace=TRUE),
  v=runif(100))

Aggregating this data into a summary table by group and year is simple and elegant:

table <- DT[,mean(v),by='group, year']

However, aggregating this data into a summary table, including subtotals and grand totals, is a little more difficult, and a lot less elegant:

library(plyr)
yearTot <- DT[,list(mean(v),year='Total'),by='group']
groupTot <- DT[,list(mean(v),group='Total'),by='year']
Tot <- DT[,list(mean(v), year='Total', group='Total')]
table <- rbind.fill(table,yearTot,groupTot,Tot)
table$group[table$group==1] <- 'Total'
table$year[table$year==1] <- 'Total'

This yields:

table[order(table$group, table$year), ]

Is there a simple way to specify subtotals and grand totals with data.table, such as the margins=TRUE command for plyr? I would prefer to use data.table over plyr on my dataset, as it is a very large dataset that I already have in the data.table format.

248

asked Feb 16 '12 16:02

Zach

Video Answer

1 Answers

In recent devel data.table you can use new feature called "grouping sets" to produce sub totals:

library(data.table)
set.seed(1)
DT = data.table(
    group=sample(letters[1:2],100,replace=TRUE), 
    year=sample(2010:2012,100,replace=TRUE),
    v=runif(100))

cube(DT, mean(v), by=c("group","year"))
#    group year        V1
# 1:     a 2011 0.4176346
# 2:     b 2010 0.5231845
# 3:     b 2012 0.4306871
# 4:     b 2011 0.4997119
# 5:     a 2012 0.4227796
# 6:     a 2010 0.2926945
# 7:    NA 2011 0.4463616
# 8:    NA 2010 0.4278093
# 9:    NA 2012 0.4271160
#10:     a   NA 0.3901875
#11:     b   NA 0.4835788
#12:    NA   NA 0.4350153
cube(DT, mean(v), by=c("group","year"), id=TRUE)
#    grouping group year        V1
# 1:        0     a 2011 0.4176346
# 2:        0     b 2010 0.5231845
# 3:        0     b 2012 0.4306871
# 4:        0     b 2011 0.4997119
# 5:        0     a 2012 0.4227796
# 6:        0     a 2010 0.2926945
# 7:        2    NA 2011 0.4463616
# 8:        2    NA 2010 0.4278093
# 9:        2    NA 2012 0.4271160
#10:        1     a   NA 0.3901875
#11:        1     b   NA 0.4835788
#12:        3    NA   NA 0.4350153

166

answered Nov 05 '22 12:11

jangorecki

Related questions
                            
                                Error ".onLoad failed in loadNamespace() for 'tcltk'"
                            
                                Iterating over characters of string R
                            
                                Trying to publish an R notebook and keep getting the same error (Error in contrib.url(repos, "source") trying to use CRAN without setting a mirror
                            
                                Efficiently change elements in data based on neighbouring elements
                            
                                How can I add annotations below the x axis in ggplot2?
                            
                                How to get ranks with no gaps when there are ties among values?
                            
                                How can I read the source code for an R function?
                            
                                creating a triangular matrix
                            
                                Writing the data frame to MySql DB table
                            
                                Random forests in R (empty classes in y and argument legth 0)
                            
                                How to remove specific special characters in R
                            
                                Cannot compile a simple JNI program on Debian Wheezhy
                            
                                Using dplyr with filter, group_by & tail?
                            
                                use dplyr to concatenate a column [duplicate]
                            
                                Unpacking and merging lists in a column in data.frame
                            
                                Adding multiple conditions in conditionalPanel in Shiny
                            
                                Shiny server session time out doesn't work
                            
                                Find distribution of consecutive zeros
                            
                                Implementation of parallel coordinates? [closed]
                            
                                matrix %in% matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Aggregating sub totals and grand totals with data.table

Tags:

r

aggregate

data.table

plyr

Zach

People also ask

Video Answer

1 Answers

jangorecki

Recent Activity

Donate For Us