Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add margin row totals in dplyr chain

Tags:

r

dplyr

I would like to add overall summary rows while also calculating summaries by group using dplyr. I have found various questions asking how to do this, e.g. here, here, and here, but no clear solution. One possible approach is to perform count twice and bind the rows:

mtcars %>% 
  count(cyl, gear) %>% 
  bind_rows(
    count(mtcars, gear)
  )

which nearly produces what I need (the left-most column has NAs rather than 'Total' or similar):

     cyl  gear     n
   <dbl> <dbl> <int>
1      4     3     1
2      4     4     8
3      4     5     2
4      6     3     2
5      6     4     4
6      6     5     1
7      8     3    12
8      8     5     2
9     NA     3    15
10    NA     4    12
11    NA     5     5

Am I missing an easier/built-in solution?

like image 502
Jonny Avatar asked Sep 15 '16 09:09

Jonny


People also ask

How do I sum across rows in R dplyr?

Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

Does dplyr include Tidyr?

dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.


2 Answers

With adorn_totals() from the janitor package:

library(janitor) mtcars %>%   tabyl(cyl, gear) %>%   adorn_totals("row")      cyl  3  4 5      4  1  8 2      6  2  4 1      8 12  0 2  Total 15 12 5 

To get from there to the "long" form in your post, add tidyr::gather() to the pipeline:

mtcars %>%   tabyl(cyl, gear) %>%   adorn_totals("row") %>%   tidyr::gather(gear, n, 2:ncol(.), convert = TRUE)       cyl gear  n 1      4    3  1 2      6    3  2 3      8    3 12 4  Total    3 15 5      4    4  8 6      6    4  4 7      8    4  0 8  Total    4 12 9      4    5  2 10     6    5  1 11     8    5  2 12 Total    5  5 

Self-promotion alert, I authored this package - adding this answer b/c it's a genuinely efficient solution here.

like image 159
Sam Firke Avatar answered Oct 14 '22 19:10

Sam Firke


One option is with do

mtcars %>%
   count(cyl, gear) %>%
   ungroup() %>% 
   mutate(cyl=as.character(cyl)) %>% 
   do(bind_rows(., data.frame(cyl="Total", count(mtcars, gear)))) 
   #or replace the last 'do' step with 
   #bind_rows(cbind(cyl='Total', count(mtcars, gear))) #from  @JonnyPolonsky's comments

#      cyl  gear     n
#   <chr> <dbl> <int>
#1      4     3     1
#2      4     4     8
#3      4     5     2
#4      6     3     2
#5      6     4     4
#6      6     5     1
#7      8     3    12
#8      8     5     2
#9  Total     3    15
#10 Total     4    12
#11 Total     5     5
like image 40
akrun Avatar answered Oct 14 '22 19:10

akrun