Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr if_else() vs base R ifelse()

I am fairly proficient within the Tidyverse, but have always used ifelse() instead of dplyr if_else(). I want to switch this behavior and default to always using dplyr::if_else() and deprecating ifelse() from my code.

Is there any reason not to do this? Would this likely get me into trouble? I'll spare you the details, but recently, not using if_else() screwed me up, when I unknowingly created a column of character matrices in my data analysis. If I switch to always using if_else() I hope to avoid this issue in the future.

like image 896
stackinator Avatar asked Jun 01 '18 14:06

stackinator


People also ask

What is the difference between Ifelse and If_else in R?

The if_else() statement expects a Boolean (True/False) output from the line of code that is entered in the first argument. If the result is anything else, such as an integer or a text string, it errors out. The ifelse() statement on the other hand is very loose with its definition of a test condition.

What does If_else do in R?

It checks that true and false are the same type. This strictness makes the output type more predictable, and makes it somewhat faster.


4 Answers

if_else is more strict. It checks that both alternatives are of the same type and otherwise throws an error, while ifelse will promote types as necessary. This may be a benefit in some circumstances, but may otherwise break scripts if you don't check for errors or explicitly force type conversion. For example:

ifelse(c(TRUE,TRUE,FALSE),"a",3)
[1] "a" "a" "3"
if_else(c(TRUE,TRUE,FALSE),"a",3)
Error: `false` must be type character, not double
like image 99
James Avatar answered Oct 04 '22 14:10

James


Another reason to choose if_else over ifelse is that ifelse turns Date into numeric objects

Dates <- as.Date(c('2018-10-01', '2018-10-02', '2018-10-03'))
new_Dates <- ifelse(Dates == '2018-10-02', Dates + 1, Dates)
str(new_Dates)

#>  num [1:3] 17805 17807 17807

if_else is also faster than ifelse.

Note that when testing multiple conditions, the code would be more readable and less error-prone if we use case_when.

library(dplyr)

case_when(
  Dates == '2018-10-01' ~ Dates - 1,
  Dates == '2018-10-02' ~ Dates + 1,
  Dates == '2018-10-03' ~ Dates + 2,
  TRUE ~ Dates
)

#> [1] "2018-09-30" "2018-10-03" "2018-10-05"

Created on 2018-06-01 by the reprex package (v0.2.0).

like image 33
Tung Avatar answered Oct 04 '22 16:10

Tung


I'd also add that if_else() can attribute a value in case of NA, which is a handy way of adding an extra condition.

df <- data_frame(val = c(80, 90, NA, 110))
df %>% mutate(category = if_else(val < 100, 1, 2, missing = 9))

#     val category
#   <dbl>    <dbl>
# 1    80        1
# 2    90        1
# 3    NA        9
# 4   110        2
like image 30
Joe Avatar answered Oct 04 '22 16:10

Joe


Another important reason for preferring if_else() to ifelse() is checking for consistency in lengths. See this dangerous gotcha:

> tibble(x = 1:3, y = ifelse(TRUE, x, 4:6))
# A tibble: 3 x 2
      x     y
  <int> <int>
1     1     1
2     2     1
3     3     1

Compare with

> tibble(x = 1:3, y = if_else(TRUE, x, 4:6))
    Error: `true` must be length 1 (length of `condition`), not 3.

The intention in both cases is clearly for column y to equal x or to equal 4:6 acording to the value of a single (scalar) logical variable; ifelse() silently truncates its output to length 1, which is then silently recycled. if_else() catches what is almost certainly an error at source.

like image 36
ChrisW Avatar answered Oct 04 '22 16:10

ChrisW