Does ifelse really calculate both the yes and no vectors -- as in, the entirety of each vector? Or does it just calculate some values from each vector?
Also, is ifelse really that slow?
ifelse calculates both its yes value and its no value. Except in the case where the test condition is either all TRUE or all FALSE.
We can see this by generating random numbers and observing how many numbers are actually generated. (by reverting the seed).
# TEST CONDITION, ALL TRUE set.seed(1) dump <- ifelse(rep(TRUE, 200), rnorm(200), rnorm(200)) next.random.number.after.all.true <- rnorm(1) # TEST CONDITION, ALL FALSE set.seed(1) dump <- ifelse(rep(FALSE, 200), rnorm(200), rnorm(200)) next.random.number.after.all.false <- rnorm(1) # TEST CONDITION, MIXED set.seed(1) dump <- ifelse(c(FALSE, rep(TRUE, 199)), rnorm(200), rnorm(200)) next.random.number.after.some.TRUE.some.FALSE <- rnorm(1) # RESET THE SEED, GENERATE SEVERAL RANDOM NUMBERS TO SEARCH FOR A MATCH set.seed(1) r.1000 <- rnorm(1000) cat("Quantity of random numbers generated during the `ifelse` statement when:", "\n\tAll True ", which(r.1000 == next.random.number.after.all.true) - 1, "\n\tAll False ", which(r.1000 == next.random.number.after.all.false) - 1, "\n\tMixed T/F ", which(r.1000 == next.random.number.after.some.TRUE.some.FALSE) - 1 ) Gives the following output:
Quantity of random numbers generated during the `ifelse` statement when: All True 200 All False 200 Mixed T/F 400 <~~ Notice TWICE AS MANY numbers were generated when `test` had both T & F values present . . if (any(test[!nas])) ans[test & !nas] <- rep(yes, length.out = length(ans))[test & # <~~~~ This line and the one below !nas] if (any(!test[!nas])) ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & # <~~~~ ... are the cluprits !nas] . . Notice that yes and no are computed only if there is some non-NA value of test that is TRUE or FALSE (respectively).
At which point -- and this is the imporant part when it comes to efficiency -- the entirety of each vector is computed.
Lets see if we can test it:
library(microbenchmark) # Create some sample data N <- 1e4 set.seed(1) X <- sample(c(seq(100), rep(NA, 100)), N, TRUE) Y <- ifelse(is.na(X), rnorm(X), NA) # Y has reverse NA/not-NA setup than X yesifelse <- quote(sort(ifelse(is.na(X), Y+17, X-17 ) )) noiflese <- quote(sort(c(Y[is.na(X)]+17, X[is.na(Y)]-17))) identical(eval(yesifelse), eval(noiflese)) # [1] TRUE microbenchmark(eval(yesifelse), eval(noiflese), times=50L) N = 1,000 Unit: milliseconds expr min lq median uq max neval eval(yesifelse) 2.286621 2.348590 2.411776 2.537604 10.05973 50 eval(noiflese) 1.088669 1.093864 1.122075 1.149558 61.23110 50 N = 10,000 Unit: milliseconds expr min lq median uq max neval eval(yesifelse) 30.32039 36.19569 38.50461 40.84996 98.77294 50 eval(noiflese) 12.70274 13.58295 14.38579 20.03587 21.68665 50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With