Summarize data at different aggregate levels - R and tidyverse

Tags:

I'm creating a bunch of basic status reports and one of things I'm finding tedious is adding a total row to all my tables. I'm currently using the Tidyverse approach and this is an example of my current code. What I'm looking for is an option to have a few different levels included by default.

#load into RStudio viewer (not required)
iris = iris

#summary at the group level
summary_grouped = iris %>% 
       group_by(Species) %>%
       summarize(mean_s_length = mean(Sepal.Length),
                 max_s_width = max(Sepal.Width))

#summary at the overall level
summary_overall = iris %>% 
  summarize(mean_s_length = mean(Sepal.Length),
            max_s_width = max(Sepal.Width)) %>%
  mutate(Species = "Overall")

#append results for report       
summary_table = rbind(summary_grouped, summary_overall)

Doing this multiple times over is very tedious. I kind of want:

summary_overall = iris %>% 
       group_by(Species, total = TRUE) %>%
       summarize(mean_s_length = mean(Sepal.Length),
                 max_s_width = max(Sepal.Width))

FYI - if you're familiar with SAS I'm looking for the same type of functionality available via a class, ways or types statements in proc means that let me control the level of summarization and get multiple levels in one call.

Any help is appreciated. I know I can create my own function, but was hoping there is something that already exists. I would also prefer to stick with the tidyverse style of programming though I'm not set on that.

242

asked Jun 21 '19 19:06

Reeza

1 Answers

Another alternative:

library(tidyverse)  

iris %>% 
  mutate_at("Species", as.character) %>%
  list(group_by(.,Species), .) %>%
  map(~summarize(.,mean_s_length = mean(Sepal.Length),
                 max_s_width = max(Sepal.Width))) %>%
  bind_rows() %>%
  replace_na(list(Species="Overall"))
#> # A tibble: 4 x 3
#>   Species    mean_s_length max_s_width
#>   <chr>              <dbl>       <dbl>
#> 1 setosa              5.01         4.4
#> 2 versicolor          5.94         3.4
#> 3 virginica           6.59         3.8
#> 4 Overall             5.84         4.4

answered Sep 19 '22 13:09

Moody_Mudskipper

Related questions
                            
                                How to split a string on first number only
                            
                                How can I subscript names in a table from kable()?
                            
                                Getting rid of border in pdf output for geom_label for ggplot2 in R
                            
                                Order multiple variables in ggplot2
                            
                                Wider margins for grid.arrange function
                            
                                Table including explicit NAs in R > 3.4.0
                            
                                Aggregating values on a data tree with R
                            
                                How to get sha of current git commit from R
                            
                                circle around a geographic point with st_buffer
                            
                                How to set class_weight in keras package of R?
                            
                                Error faced while using TM package's VCorpus in R
                            
                                Aligning labels with ggrepel
                            
                                Add annotation and segments to groups of legend elements
                            
                                Visualising a three way interaction between two continuous variables and one categorical variable in R
                            
                                I can't find "Knit" button in RStudio
                            
                                add CSS class to shiny textOutput
                            
                                count number of digits in a string in r
                            
                                Creating a ts time series with missing values from a data frame
                            
                                ggplot increase border line thickness
                            
                                Programmatically switch package in `::` call in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Summarize data at different aggregate levels - R and tidyverse

Tags:

r

dplyr

tidyverse

group-summaries

Reeza

People also ask

1 Answers

Moody_Mudskipper

Recent Activity

Donate For Us