Curious why the following produces an "NAs introduced by coercion" warning
# Example dataframe
df <- tibble(
session = c("a",2),
)
df %>%
mutate(sessionNum = case_when(
session == "a" ~ 1,
TRUE ~ as.numeric(session)
))
I thought there'd be no need for coercing anything into an "NA" as "a" is covered by the first case_when.
Even the following dataframe produces the warning!
# Example dataframe
df <- tibble(
session = c("a"),
)
It seems that dplyr's if_else() and case_when() evaluate the RHS regardless if the condition is TRUE or not. According to Hadley Wickham, this is needed for type stability.
https://github.com/tidyverse/dplyr/issues/5321
Since the RHS is being evaluated for all values you are coercing a character type into a numeric type (i.e., as.numeric(c('a', '2'))) resulting in the warning.
Also noted here: https://github.com/tidyverse/dplyr/issues/5341
I believe it's because case_when() is vectorized on the result.
Taking a slightly simpler example:
library(dplyr)
session <- c("a", 2)
session
#> [1] "a" "2"
case_when(
session == "a" ~ 1,
.default = as.numeric(session)
)
#> Warning in vec_case_when(conditions = conditions, values = values,
#> conditions_arg = "", : NAs introduced by coercion
#> [1] 1 2
This is roughly equivalent to the base R code:
session <- c("a", 2)
condition <- session == "a"
result_if_true <- 1
result_if_false <- as.numeric(session)
#> Warning: NAs introduced by coercion
result <- vector(length = length(session))
result[condition] <- result_if_true[condition]
result[!condition] <- result_if_false[!condition]
result
#> [1] 1 2
So, even though as.numeric("a") is not present in the final result, it is computed in an intermediate step.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With