Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use dplyr across() programmatically on no variables?

Issue:

I want to use across() programmatically so that if, e.g. NULL or an empty string is passed to it, the function won't fail. This is possibly using scoped variants of functions such as group_by_at(), but I'd like to make it work neatly (i.e. without if-statements) using across().

Note also that currently across() will affect all columns if left empty. I'm unsure what the motivation for this is; to me it would make more sense if no columns were affected.

Example

Here's a quick example using functions to calculate the mean of a variable y. Passing a grouping variable works with group_by_at(), but not with across() as shown:

my_df <- tibble("x" = c("a", "a", "b", "b"), y = 1:4)

compute_mean1 <- function(df, grouping) { # compute grouped mean with across()
  df %>% 
    group_by(across(all_of(grouping))) %>% 
    summarise(y = mean(y), .groups = "drop")
}

compute_mean2 <- function(df, grouping) { # compute grouped mean with group_by_at()
  df %>% 
    group_by_at(grouping) %>% 
    summarise(y = mean(y), .groups = "drop")
}


compute_mean1(my_df, "x")
#> # A tibble: 2 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3.5
compute_mean1(my_df, NULL)
#> Error: `vars` must be a character vector.
compute_mean2(my_df, "x")
#> # A tibble: 2 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3.5
compute_mean2(my_df, NULL)
#> # A tibble: 1 x 1
#>       y
#>   <dbl>
#> 1   2.5

Created on 2020-07-14 by the reprex package (v0.3.0)

like image 296
wurli Avatar asked Jul 14 '20 15:07

wurli


1 Answers

Use .add=TRUE like this:

compute_mean3 <- function(df, grouping) { # compute grouped mean with across()
  df %>% 
    group_by(across(all_of(grouping)), .add = TRUE) %>%
    summarise(y = mean(y), .groups = "drop")
}
 
like image 106
G. Grothendieck Avatar answered Sep 28 '22 07:09

G. Grothendieck