I'm trying to translate a mutate_at() to a mutate() using dplyr's new "across" function and a bit stumped.
In a nutshell, I need to compare the values in a series of columns to a "baseline" column. When the values in the columns are higher than the baseline, I need to use the baseline value. When the values in the columns are lower than or equal to the baseline, I need to keep the value. Here's an example dataset (my actual dataset is much larger):
test <- structure(list(baseline = c(5, 7, 8, 4, 9, 1, 0, 46, 47), bob = c(7,
11, 34, 9, 6, 8, 3, 49, 12), sally = c(3, 5, 2, 2, 6, 1, 3, 4,
56), rita = c(6, 4, 6, 7, 6, 0, 3, 11, 3)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), spec = structure(list(
cols = list(baseline = structure(list(), class = c("collector_double",
"collector")), bob = structure(list(), class = c("collector_double",
"collector")), sally = structure(list(), class = c("collector_double",
"collector")), rita = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
My current code uses mutate_at() and works fine:
trial1 <- test %>%
mutate_at(
vars('bob','sally', 'rita'),
funs(case_when(
. > baseline ~ baseline,
. <= baseline ~ .)))
But when I try to update it to reflect across() from dplyr 1.0, I keep getting an error. Here is my attempt:
trial2 <- test %>%
mutate(across(c(bob, sally, rita),
case_when(. > baseline ~ baseline,
. <= baseline ~ .)))
And here is the error:
error: Problem with
mutate()
input..1
. x. > baseline ~ baseline
,. <= baseline ~ .
must be length 36 or one, not 9, 4. ℹ Input..1
isacross(...)
Any ideas what I might be doing wrong? Does case_when() work with across?
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
case_when.Rd. This function allows you to vectorise multiple if_else() statements. It is an R equivalent of the SQL CASE WHEN statement.
Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.
We can use the ~
to specify the anonymous function/lambda function call
library(dplyr)
test %>%
mutate(across(c(bob, sally, rita),
~ case_when(. > baseline ~ baseline,
. <= baseline ~ .)))
-output
# A tibble: 9 x 4
# baseline bob sally rita
# <dbl> <dbl> <dbl> <dbl>
#1 5 5 3 5
#2 7 7 5 4
#3 8 8 2 6
#4 4 4 2 4
#5 9 6 6 6
#6 1 1 1 0
#7 0 0 0 0
#8 46 46 4 11
#9 47 12 47 3
Or with .funs
argument
test %>%
mutate(across(c(bob, sally, rita),
.funs = case_when(. > baseline ~ baseline,
. <= baseline ~ .)))
According to ?across
the arguments to fns
can be either
Functions to apply to each of the selected columns. Possible values are:
NULL, to returns the columns untransformed.
A function, e.g. mean.
A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)
A list of functions/lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x))
Also, instead of case_when
, we can make use of the pmin
test %>%
mutate(across(c(bob, sally, rita), ~ pmin(baseline, .)))
-output
# A tibble: 9 x 4
# baseline bob sally rita
# <dbl> <dbl> <dbl> <dbl>
#1 5 5 3 5
#2 7 7 5 4
#3 8 8 2 6
#4 4 4 2 4
#5 9 6 6 6
#6 1 1 1 0
#7 0 0 0 0
#8 46 46 4 11
#9 47 12 47 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With