Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using case_when with dplyr across

Tags:

r

dplyr

I'm trying to translate a mutate_at() to a mutate() using dplyr's new "across" function and a bit stumped.

In a nutshell, I need to compare the values in a series of columns to a "baseline" column. When the values in the columns are higher than the baseline, I need to use the baseline value. When the values in the columns are lower than or equal to the baseline, I need to keep the value. Here's an example dataset (my actual dataset is much larger):

test <- structure(list(baseline = c(5, 7, 8, 4, 9, 1, 0, 46, 47), bob = c(7, 
11, 34, 9, 6, 8, 3, 49, 12), sally = c(3, 5, 2, 2, 6, 1, 3, 4, 
56), rita = c(6, 4, 6, 7, 6, 0, 3, 11, 3)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), spec = structure(list(
    cols = list(baseline = structure(list(), class = c("collector_double", 
    "collector")), bob = structure(list(), class = c("collector_double", 
    "collector")), sally = structure(list(), class = c("collector_double", 
    "collector")), rita = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

My current code uses mutate_at() and works fine:

trial1 <- test %>% 
  mutate_at(
    vars('bob','sally', 'rita'),
    funs(case_when(
      . > baseline ~ baseline, 
      . <= baseline ~ .)))

But when I try to update it to reflect across() from dplyr 1.0, I keep getting an error. Here is my attempt:

trial2 <- test %>% 
  mutate(across(c(bob, sally, rita), 
                case_when(. > baseline ~ baseline, 
                          . <= baseline ~ .)))

And here is the error:

error: Problem with mutate() input ..1. x . > baseline ~ baseline, . <= baseline ~ . must be length 36 or one, not 9, 4. ℹ Input ..1 is across(...)

Any ideas what I might be doing wrong? Does case_when() work with across?

like image 816
James DeWeese Avatar asked Oct 03 '20 22:10

James DeWeese


People also ask

What does %>% do in Dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What is Case_when in R?

case_when.Rd. This function allows you to vectorise multiple if_else() statements. It is an R equivalent of the SQL CASE WHEN statement.

Is Dplyr in Tidyverse?

Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.


1 Answers

We can use the ~ to specify the anonymous function/lambda function call

library(dplyr)
test %>% 
   mutate(across(c(bob, sally, rita), 
             ~ case_when(. > baseline ~ baseline, 
                       . <= baseline ~ .)))

-output

# A tibble: 9 x 4
#  baseline   bob sally  rita
#     <dbl> <dbl> <dbl> <dbl>
#1        5     5     3     5
#2        7     7     5     4
#3        8     8     2     6
#4        4     4     2     4
#5        9     6     6     6
#6        1     1     1     0
#7        0     0     0     0
#8       46    46     4    11
#9       47    12    47     3

Or with .funs argument

test %>% 
        mutate(across(c(bob, sally, rita), 
                  .funs = case_when(. > baseline ~ baseline, 
                            . <= baseline ~ .)))

According to ?across the arguments to fns can be either

Functions to apply to each of the selected columns. Possible values are:

NULL, to returns the columns untransformed.

A function, e.g. mean.

A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)

A list of functions/lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x))


Also, instead of case_when, we can make use of the pmin

test %>% 
    mutate(across(c(bob, sally, rita), ~ pmin(baseline, .)))

-output

# A tibble: 9 x 4
#  baseline   bob sally  rita
#     <dbl> <dbl> <dbl> <dbl>
#1        5     5     3     5
#2        7     7     5     4
#3        8     8     2     6
#4        4     4     2     4
#5        9     6     6     6
#6        1     1     1     0
#7        0     0     0     0
#8       46    46     4    11
#9       47    12    47     3
like image 64
akrun Avatar answered Oct 23 '22 10:10

akrun