(Background info: ifelse
evaluates both of the expressions, even though only one will be returned. EDIT: This is an incorrect statement. See Tommy's reply)
Is there any example where it makes sense to use ifelse
in a non-vectorized situation? I think that "readability" could be a valid answer when we don't care about small efficiency gains, but besides that, is it ever faster/equivalent/better-in-some-other-way to use ifelse
when an if
and then else
would do the job?
Similarly, if I have a vectorized situation, is ifelse
always the best tool to use? It seems strange that both expressions are evaluated. Is it ever faster to loop through one by one and do a normal if
and then else
? I'm guessing it would make sense only if evaluating the expressions took a really long time. Is there any other alternative that would not involve an explicit loop?
Thanks
First, ifelse
does NOT always evaluate both expressions - only if there are both TRUE
and FALSE
elements in the test vector.
ifelse(TRUE, 'foo', stop('bar')) # "foo"
And in my opinion:
ifelse
should not be used in a non-vectorized situation. It is always slower and more error prone to use ifelse
over if
/ else
:
# This is fairly common if/else code
if (length(letters) > 0) letters else LETTERS
# But this "equivalent" code will yield a very different result - TRY IT!
ifelse(length(letters) > 0, letters, LETTERS)
In vectorized situations though, ifelse
can be a good choice - but beware that the length and attributes of the result might not be what you expect (as above, and I consider ifelse
broken in that respect).
Here's an example: tst
is of length 5 and has a class. I'd expect the result to be of length 10 and have no class, but that isn't what happens - it gets an incompatible class and length 5!
# a logical vector of class 'mybool'
tst <- structure(1:5 %%2 > 0, class='mybool')
# produces a numeric vector of class 'mybool'!
ifelse(tst, 101:110, 201:210)
#[1] 101 202 103 204 105
#attr(,"class")
#[1] "mybool"
Why would I expect the length to be 10? Because most functions in R "cycle" the shorter vector to match the longer:
1:5 + 1:10 # returns a vector of length 10.
...But ifelse
only cycles the yes/no arguments to match the length of the tst argument.
Why would I expect the class (and other attributes) to not be copied from the test object? Because <
which returns a logical vector does not copy class and attributes from its (typically numeric) arguments. It doesn't do that because it would typically be very wrong.
1:5 < structure(1:10, class='mynum') # returns a logical vector without class
Finally, can it be more efficient to "do it yourself"? Well, it seems that ifelse
is not a primitive like if
, and it needs some special code to handle NA
. If you don't have NA
s, it can be faster to do it yourself.
tst <- 1:1e7 %%2 == 0
a <- rep(1, 1e7)
b <- rep(2, 1e7)
system.time( r1 <- ifelse(tst, a, b) ) # 2.58 sec
# If we know that a and b are of the same length as tst, and that
# tst doesn't have NAs, then we can do like this:
system.time( { r2 <- b; r2[tst] <- a[tst]; r2 } ) # 0.46 secs
identical(r1, r2) # TRUE
On your second point, how do you define "best"? I think ifelse()
is one of the more readable solutions, but may not always be the fastest. Specifically, I've found that writing out boolean conditions and adding them together can give you some performance benefits. Here's a quick example:
> x <- rnorm(1e6)
> system.time(y1 <- ifelse(x > 0,1,2))
user system elapsed
0.46 0.08 0.53
> system.time(y2 <- (x > 0) * 1 + (x <= 0) * 2)
user system elapsed
0.06 0.00 0.06
> identical(y1, y2)
[1] TRUE
So, if speed is your biggest concern, the boolean approach may be better. However, for most of my purposes - I've found ifelse()
quick enough and is easy to grok. Your miles may vary obviously.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With