When I was re-reading Hadley's Advanced R recently, I noticed that he said in Chapter 6 that <code>`if`</code> can be used as a function like <code> `if`(i == 1, print("yes"), print("no")) </code> (If you have the physical book in hand, it's on Page 80) We know that <code>ifelse</code> is slow (Does ifelse really calculate both of its vectors every time? Is it slow?) as it evaluates all arguments. Will <code>`if`</code> be a good alternative to that as <code>if</code> seems to only evaluate <code>TRUE</code> arguments (this is just my assumption)? <hr> Update: Based on the answers from @Benjamin and @Roman and the comments from @Gregor and many others, <code>ifelse</code> seems to be a better solution for vectorized calculations. I'm taking @Benjamin's answer here as it provides a more comprehensive comparison and for the community wellness. However, both answers(and the comments) are worth reading.

This is more of an extended comment building on Roman's answer, but I need the code utilities to expound: Roman is correct that <code>if</code> is faster than <code>ifelse</code>, but I am under the impression that the speed boost of <code>if</code> isn't particularly interesting since it isn't something that can easily be harnessed through vectorization. That is to say, <code>if</code> is only advantageous over <code>ifelse</code> when the <code>cond</code>/<code>test</code> argument is of length 1. Consider the following function which is an admittedly weak attempt at vectorizing <code>if</code> without having the side effect of evaluating both the <code>yes</code> and <code>no</code> conditions as <code>ifelse</code> does. <pre class="prettyprint"><code>ifelse2 <- function(test, yes, no){ result <- rep(NA, length(test)) for (i in seq_along(test)){ result[i] <- `if`(test[i], yes[i], no[i]) } result } ifelse2a <- function(test, yes, no){ sapply(seq_along(test), function(i) `if`(test[i], yes[i], no[i])) } ifelse3 <- function(test, yes, no){ result <- rep(NA, length(test)) logic <- test result[logic] <- yes[logic] result[!logic] <- no[!logic] result } set.seed(pi) x <- rnorm(1000) library(microbenchmark) microbenchmark( standard = ifelse(x < 0, x^2, x), modified = ifelse2(x < 0, x^2, x), modified_apply = ifelse2a(x < 0, x^2, x), third = ifelse3(x < 0, x^2, x), fourth = c(x, x^2)[1L + ( x < 0 )], fourth_modified = c(x, x^2)[seq_along(x) + length(x) * (x < 0)] ) Unit: microseconds expr min lq mean median uq max neval cld standard 52.198 56.011 97.54633 58.357 68.7675 1707.291 100 ab modified 91.787 93.254 131.34023 94.133 98.3850 3601.967 100 b modified_apply 645.146 653.797 718.20309 661.568 676.0840 3703.138 100 c third 20.528 22.873 76.29753 25.513 27.4190 3294.350 100 ab fourth 15.249 16.129 19.10237 16.715 20.9675 43.695 100 a fourth_modified 19.061 19.941 22.66834 20.528 22.4335 40.468 100 a </code></pre> SOME EDITS: Thanks to Frank and Richard Scriven for noticing my shortcomings. As you can see, the process of breaking up the vector to be suitable to pass to <code>if</code> is a time consuming process and ends up being slower than just running <code>ifelse</code> (which is probably why no one has bothered to implement my solution). If you're really desperate for an increase in speed, you can use the <code>ifelse3</code> approach above. Or better yet, Frank's less obvious* but brilliant solution. <ul> <li>by 'less obvious' I mean, it took me two seconds to realize what he did. And per nicola's comment below, please note that this works only when <code>yes</code> and <code>no</code> have length 1, otherwise you'll want to stick with <code>ifelse3</code> </li> </ul>

Is `if` faster than ifelse?

Tags:

performance

r

benchmarking

if-statement

When I was re-reading Hadley's Advanced R recently, I noticed that he said in Chapter 6 that `if` can be used as a function like `if`(i == 1, print("yes"), print("no")) (If you have the physical book in hand, it's on Page 80)

We know that ifelse is slow (Does ifelse really calculate both of its vectors every time? Is it slow?) as it evaluates all arguments. Will `if` be a good alternative to that as if seems to only evaluate TRUE arguments (this is just my assumption)?

Update: Based on the answers from @Benjamin and @Roman and the comments from @Gregor and many others, ifelse seems to be a better solution for vectorized calculations. I'm taking @Benjamin's answer here as it provides a more comprehensive comparison and for the community wellness. However, both answers(and the comments) are worth reading.

614

asked Nov 30 '15 18:11

Hao

1 Answers

This is more of an extended comment building on Roman's answer, but I need the code utilities to expound:

Roman is correct that if is faster than ifelse, but I am under the impression that the speed boost of if isn't particularly interesting since it isn't something that can easily be harnessed through vectorization. That is to say, if is only advantageous over ifelse when the cond/test argument is of length 1.

Consider the following function which is an admittedly weak attempt at vectorizing if without having the side effect of evaluating both the yes and no conditions as ifelse does.

ifelse2 <- function(test, yes, no){
 result <- rep(NA, length(test))
 for (i in seq_along(test)){
   result[i] <- `if`(test[i], yes[i], no[i])
 }
 result
}

ifelse2a <- function(test, yes, no){
  sapply(seq_along(test),
         function(i) `if`(test[i], yes[i], no[i]))
}

ifelse3 <- function(test, yes, no){
  result <- rep(NA, length(test))
  logic <- test
  result[logic] <- yes[logic]
  result[!logic] <- no[!logic]
  result
}


set.seed(pi)
x <- rnorm(1000)

library(microbenchmark)
microbenchmark(
  standard = ifelse(x < 0, x^2, x),
  modified = ifelse2(x < 0, x^2, x),
  modified_apply = ifelse2a(x < 0, x^2, x),
  third = ifelse3(x < 0, x^2, x),
  fourth = c(x, x^2)[1L + ( x < 0 )],
  fourth_modified = c(x, x^2)[seq_along(x) + length(x) * (x < 0)]
)

Unit: microseconds
            expr     min      lq      mean  median       uq      max neval cld
        standard  52.198  56.011  97.54633  58.357  68.7675 1707.291   100 ab 
        modified  91.787  93.254 131.34023  94.133  98.3850 3601.967   100  b 
  modified_apply 645.146 653.797 718.20309 661.568 676.0840 3703.138   100   c
           third  20.528  22.873  76.29753  25.513  27.4190 3294.350   100 ab 
          fourth  15.249  16.129  19.10237  16.715  20.9675   43.695   100 a  
 fourth_modified  19.061  19.941  22.66834  20.528  22.4335   40.468   100 a

SOME EDITS: Thanks to Frank and Richard Scriven for noticing my shortcomings.

As you can see, the process of breaking up the vector to be suitable to pass to if is a time consuming process and ends up being slower than just running ifelse (which is probably why no one has bothered to implement my solution).

If you're really desperate for an increase in speed, you can use the ifelse3 approach above. Or better yet, Frank's less obvious* but brilliant solution.

by 'less obvious' I mean, it took me two seconds to realize what he did. And per nicola's comment below, please note that this works only when yes and no have length 1, otherwise you'll want to stick with ifelse3

174

answered Sep 19 '22 17:09

Benjamin

Related questions
                            
                                Create bookmarks into a PDF file via command line
                            
                                "read_excel" in a Shiny app
                            
                                data.table in R - multiple filters using multiple keys - binary search
                            
                                How to print text and variables in a single line in r
                            
                                Match/group duplicate rows (indices)
                            
                                RStudio gives "Incorrect function" when setting git as Version control
                            
                                Embed Rmarkdown with Rmarkdown, without knitr evaluation
                            
                                dplyr count number of one specific value of variable
                            
                                dplyr::n() returns "Error: This function should not be called directly"
                            
                                Efficient calculation of var-covar matrix in R
                            
                                How to change the font of the main title in plot()
                            
                                Plotting google map with ggplot in R
                            
                                R: numeric vector becoming non-numeric after cbind of dates
                            
                                plots generated by 'plot' and 'ggplot' side-by-side
                            
                                strptime, as.POSIXct and as.Date return unexpected NA
                            
                                Reshape wide format, to multi-column long format
                            
                                as.Date(as.POSIXct()) gives the wrong date?
                            
                                How to round a time?
                            
                                How can I avoid having my R script printed every time I run it?
                            
                                rowMeans function in dplyr

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With