I am fairly proficient within the Tidyverse, but have always used ifelse()
instead of dplyr if_else()
. I want to switch this behavior and default to always using dplyr::if_else()
and deprecating ifelse()
from my code.
Is there any reason not to do this? Would this likely get me into trouble? I'll spare you the details, but recently, not using if_else()
screwed me up, when I unknowingly created a column of character matrices in my data analysis. If I switch to always using if_else()
I hope to avoid this issue in the future.
The if_else() statement expects a Boolean (True/False) output from the line of code that is entered in the first argument. If the result is anything else, such as an integer or a text string, it errors out. The ifelse() statement on the other hand is very loose with its definition of a test condition.
It checks that true and false are the same type. This strictness makes the output type more predictable, and makes it somewhat faster.
if_else
is more strict. It checks that both alternatives are of the same type and otherwise throws an error, while ifelse
will promote types as necessary. This may be a benefit in some circumstances, but may otherwise break scripts if you don't check for errors or explicitly force type conversion. For example:
ifelse(c(TRUE,TRUE,FALSE),"a",3)
[1] "a" "a" "3"
if_else(c(TRUE,TRUE,FALSE),"a",3)
Error: `false` must be type character, not double
Another reason to choose if_else
over ifelse
is that ifelse
turns Date
into numeric
objects
Dates <- as.Date(c('2018-10-01', '2018-10-02', '2018-10-03'))
new_Dates <- ifelse(Dates == '2018-10-02', Dates + 1, Dates)
str(new_Dates)
#> num [1:3] 17805 17807 17807
if_else
is also faster than ifelse
.
Note that when testing multiple conditions, the code would be more readable and less error-prone if we use case_when
.
library(dplyr)
case_when(
Dates == '2018-10-01' ~ Dates - 1,
Dates == '2018-10-02' ~ Dates + 1,
Dates == '2018-10-03' ~ Dates + 2,
TRUE ~ Dates
)
#> [1] "2018-09-30" "2018-10-03" "2018-10-05"
Created on 2018-06-01 by the reprex package (v0.2.0).
I'd also add that if_else()
can attribute a value in case of NA
, which is a handy way of adding an extra condition.
df <- data_frame(val = c(80, 90, NA, 110))
df %>% mutate(category = if_else(val < 100, 1, 2, missing = 9))
# val category
# <dbl> <dbl>
# 1 80 1
# 2 90 1
# 3 NA 9
# 4 110 2
Another important reason for preferring if_else()
to ifelse()
is checking for consistency in lengths. See this dangerous gotcha:
> tibble(x = 1:3, y = ifelse(TRUE, x, 4:6))
# A tibble: 3 x 2
x y
<int> <int>
1 1 1
2 2 1
3 3 1
Compare with
> tibble(x = 1:3, y = if_else(TRUE, x, 4:6))
Error: `true` must be length 1 (length of `condition`), not 3.
The intention in both cases is clearly for column y
to equal x
or to equal 4:6
acording to the value of a single (scalar) logical variable; ifelse()
silently truncates its output to length 1, which is then silently recycled. if_else()
catches what is almost certainly an error at source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With