Using dplyr summarize with different operations for multiple columns

Tags:

dplyr

Well, I know that there are already tons of related questions, but none gave an answer to my particular need.

I want to use dplyr "summarize" on a table with 50 columns, and I need to apply different summary functions to these.

"Summarize_all" and "summarize_at" both seem to have the disadvantage that it's not possible to apply different functions to different subgroups of variables.

As an example, let's assume the iris dataset would have 50 columns, so we do not want to address columns by names. I want the sum over the first two columns, the mean over the third and the first value for all remaining columns (after a group_by(Species)). How could I do this?

642

asked Feb 23 '18 09:02

CodingButStillAlive

1 Answers

Fortunately, there is a much simpler way available now. With the new dplyr 1.0.0 coming out soon, you can leverage the across function for this purpose.

All you need to type is:

iris %>% 
  group_by(Species) %>% 
  summarize(
    # I want the sum over the first two columns, 
    across(c(1,2), sum),
    #  the mean over the third 
    across(3, mean),
    # the first value for all remaining columns (after a group_by(Species))
    across(-c(1:3), first)
  )

Great, isn't it? I first thought the across is not necessary as the scoped variants worked just fine, but this use case is exactly why the across function can be very beneficial.

You can get the latest version of dplyr by devtools::install_github("tidyverse/dplyr")

answered Nov 03 '22 00:11

Agile Bean

Related questions
                            
                                Testing equality of two functions in R [duplicate]
                            
                                remove all delimiters at beginning and end of string
                            
                                R shinydashboard custom CSS to valueBox
                            
                                Remove certain words in string from column in dataframe in R
                            
                                R collapse multiple rows into 1 row - same columns
                            
                                multiply two data.tables, keep all possibilities
                            
                                What does the "+" symbol mean on the left side of the R console?
                            
                                How to draw rainfall runoff graph in R using ggplot?
                            
                                R: Extracting non-duplicated values from vector (not keeping one value for duplicates) [duplicate]
                            
                                Delete rows based on multiple conditions in r [duplicate]
                            
                                Convert nested list elements into data frame and bind the result into one data frame
                            
                                trouble installing and loading rJava on mac El Capitan
                            
                                shiny app with module as a package
                            
                                How to interpret error "elements..... must be named" when sourcing an R6 class?
                            
                                image logo over TOC in Rmarkdown
                            
                                Split a vector into chunks such that sum of each chunk is approximately constant
                            
                                Indent without adding a bullet point or number in RMarkdown
                            
                                Convert Excel numeric to date
                            
                                wrapping long geom_text labels
                            
                                How to correctly output Plotly plots in shiny?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With