Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr summarize with subtotals

Tags:

One of the great things about pivot tables in excel is that they provide subtotals automatically. First, I would like to know if there is anything already created within dplyr that can accomplish this. If not, what is the easiest way to achieve it?

In the example below, I show the mean displacement by number of cylinders and carburetors. For each group of cylinders (4,6,8), I'd like to see the mean displacement for the group (or total displacement, or any other summary statistic).

library(dplyr) mtcars %>% group_by(cyl,carb) %>% summarize(mean(disp))    cyl carb mean(disp) 1   4    1      91.38 2   4    2     116.60 3   6    1     241.50 4   6    4     163.80 5   6    6     145.00 6   8    2     345.50 7   8    3     275.80 8   8    4     405.50 9   8    8     301.00 
like image 691
Kyle Ward Avatar asked Jul 01 '15 14:07

Kyle Ward


1 Answers

data.table It's very clunky, but this is one way:

library(data.table) DT <- data.table(mtcars) rbind(   DT[,.(mean(disp)),          by=.(cyl,carb)],   DT[,.(mean(disp), carb=NA), by=.(cyl) ],   DT[,.(mean(disp), cyl=NA),  by=.(carb)] )[order(cyl,carb)] 

This gives

    cyl carb       V1  1:   4    1  91.3800  2:   4    2 116.6000  3:   4   NA 105.1364  4:   6    1 241.5000  5:   6    4 163.8000  6:   6    6 145.0000  7:   6   NA 183.3143  8:   8    2 345.5000  9:   8    3 275.8000 10:   8    4 405.5000 11:   8    8 301.0000 12:   8   NA 353.1000 13:  NA    1 134.2714 14:  NA    2 208.1600 15:  NA    3 275.8000 16:  NA    4 308.8200 17:  NA    6 145.0000 18:  NA    8 301.0000 

I'd rather see results in something like an R table, but don't know of any functions for that.


dplyr @akrun found this analogous code

bind_rows(   mtcars %>%      group_by(cyl, carb) %>%      summarise(Mean= mean(disp)),    mtcars %>%      group_by(cyl) %>%      summarise(carb=NA, Mean=mean(disp)),    mtcars %>%      group_by(carb) %>%      summarise(cyl=NA, Mean=mean(disp)) ) %>% arrange(cyl, carb) 

We could wrap the repeat operations in a function

library(lazyeval) f1 <- function(df, grp, Var, func){   FUN <- match.fun(func)    df %>%       group_by_(.dots=grp) %>%      summarise_(interp(~FUN(v), v=as.name(Var)))   }   m1 <- f1(mtcars, c('carb', 'cyl'), 'disp', 'mean')  m2 <- f1(mtcars, 'carb', 'disp', 'mean')  m3 <- f1(mtcars, 'cyl', 'disp', 'mean')   bind_rows(list(m1, m2, m3)) %>%               arrange(cyl, carb) %>%               rename(Mean=`FUN(disp)`)    carb cyl     Mean 1     1   4  91.3800 2     2   4 116.6000 3    NA   4 105.1364 4     1   6 241.5000 5     4   6 163.8000 6     6   6 145.0000 7    NA   6 183.3143 8     2   8 345.5000 9     3   8 275.8000 10    4   8 405.5000 11    8   8 301.0000 12   NA   8 353.1000 13    1  NA 134.2714 14    2  NA 208.1600 15    3  NA 275.8000 16    4  NA 308.8200 17    6  NA 145.0000 18    8  NA 301.0000 

Either option can be made a little less ugly with data.table's rbindlist with fill:

rbindlist(list(   mtcars %>% group_by(cyl) %>% summarise(mean(disp)),   mtcars %>% group_by(carb) %>% summarise(mean(disp)),   mtcars %>% group_by(cyl,carb) %>% summarise(mean(disp)) ),fill=TRUE) %>% arrange(cyl,carb)  rbindlist(list(   DT[,mean(disp),by=.(cyl,carb)],   DT[,mean(disp),by=.(cyl)],   DT[,mean(disp),by=.(carb)] ),fill=TRUE)[order(cyl,carb)] 
like image 187
Frank Avatar answered Oct 25 '22 01:10

Frank