Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parametrize function calls in dplyr 0.7?

Tags:

r

dplyr

rlang

The release of dplyr 0.7 includes a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.

Here is a common idiom I use when building reporting and aggregation functions with dplyr:

my_report <- function(data, grouping_vars) {
  data %>%
    group_by_(.dots=grouping_vars) %>%
    summarize(x_mean=mean(x), x_median=median(x), ...)
}

Here, grouping_vars is a vector of strings.

I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.

However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.

I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.

Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:

library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#>      am mean_cyl
#>   <dbl>    <dbl>
#> 1     0 6.947368
#> 2     1 5.076923

grouping_vars <- "am"
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#>   `"am"` mean_cyl
#>    <chr>    <dbl>
#> 1     am   6.1875
like image 355
Paul Avatar asked Apr 14 '17 16:04

Paul


3 Answers

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
    group_by_at(.vars = cols) %>%
    summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
# 
# am  gear mean_cyl
# <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars

A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.

like image 121
mt1022 Avatar answered Nov 15 '22 21:11

mt1022


Here's the quick and dirty reference I wrote for myself.

# install.packages("rlang")
library(tidyverse)

dat <- data.frame(cat = sample(LETTERS[1:2], 50, replace = TRUE),
                  cat2 = sample(LETTERS[3:4], 50, replace = TRUE),
                  value = rnorm(50))

Representing column names with strings

Convert strings to symbol objects using rlang::sym and rlang::syms.

summ_var <- "value"
group_vars <- c("cat", "cat2")

summ_sym <- rlang::sym(summ_var)  # capture a single symbol
group_syms <- rlang::syms(group_vars)  # creates list of symbols

dat %>%
  group_by(!!!group_syms) %>%  # splice list of symbols into a function call
  summarize(summ = sum(!!summ_sym)) # slice single symbol into call

If you use !! or !!! outside of dplyr functions you will get an error.

The usage of rlang::sym and rlang::syms is identical inside functions.

summarize_by <- function(df, summ_var, group_vars) {

  summ_sym <- rlang::sym(summ_var)
  group_syms <- rlang::syms(group_vars)

  df %>%
    group_by(!!!group_syms) %>%
    summarize(summ = sum(!!summ_sym))
}

We can then call summarize_by with string arguments.

summarize_by(dat, "value", c("cat", "cat2"))

Using non-standard evaluation for column/variable names

summ_quo <- quo(value)  # capture a single variable for NSE
group_quos <- quos(cat, cat2)  # capture list of variables for NSE

dat %>%
  group_by(!!!group_quos) %>%  # use !!! with both quos and rlang::syms
  summarize(summ = sum(!!summ_quo))  # use !! both quo and rlang::sym

Inside functions use enquo rather than quo. quos is okay though!?

summarize_by <- function(df, summ_var, ...) {

  summ_quo <- enquo(summ_var)  # can only capture a single value!
  group_quos <- quos(...)  # captures multiple values, also inside functions!?

  df %>%
    group_by(!!!group_quos) %>%
    summarize(summ = sum(!!summ_quo))
}

And then our function call is

summarize_by(dat, value, cat, cat2)
like image 41
alexpghayes Avatar answered Nov 15 '22 20:11

alexpghayes


If you want to group by possibly more than one column, you can use quos

grouping_vars <- quos(am, gear)
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

Right now, it doesn't seem like there's a great way to turn strings into quos. Here's one way that does work though

cols <- c("am","gear")
grouping_vars <- rlang::parse_quosures(paste(cols, collapse=";"))
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000
like image 6
MrFlick Avatar answered Nov 15 '22 20:11

MrFlick