It is often said that one should prefer lapply
over for
loops. There are some exception as for example Hadley Wickham points out in his Advance R book.
(http://adv-r.had.co.nz/Functionals.html) (Modifying in place, Recursion etc). The following is one of this case.
Just for sake of learning, I tried to rewrite a perceptron algorithm in a functional form in order to benchmark relative performance. source (https://rpubs.com/FaiHas/197581).
Here is the code.
# prepare input data(iris) irissubdf <- iris[1:100, c(1, 3, 5)] names(irissubdf) <- c("sepal", "petal", "species") head(irissubdf) irissubdf$y <- 1 irissubdf[irissubdf[, 3] == "setosa", 4] <- -1 x <- irissubdf[, c(1, 2)] y <- irissubdf[, 4] # perceptron function with for perceptron <- function(x, y, eta, niter) { # initialize weight vector weight <- rep(0, dim(x)[2] + 1) errors <- rep(0, niter) # loop over number of epochs niter for (jj in 1:niter) { # loop through training data set for (ii in 1:length(y)) { # Predict binary label using Heaviside activation # function z <- sum(weight[2:length(weight)] * as.numeric(x[ii, ])) + weight[1] if (z < 0) { ypred <- -1 } else { ypred <- 1 } # Change weight - the formula doesn't do anything # if the predicted value is correct weightdiff <- eta * (y[ii] - ypred) * c(1, as.numeric(x[ii, ])) weight <- weight + weightdiff # Update error function if ((y[ii] - ypred) != 0) { errors[jj] <- errors[jj] + 1 } } } # weight to decide between the two species return(errors) } err <- perceptron(x, y, 1, 10) ### my rewriting in functional form auxiliary ### function faux <- function(x, weight, y, eta) { err <- 0 z <- sum(weight[2:length(weight)] * as.numeric(x)) + weight[1] if (z < 0) { ypred <- -1 } else { ypred <- 1 } # Change weight - the formula doesn't do anything # if the predicted value is correct weightdiff <- eta * (y - ypred) * c(1, as.numeric(x)) weight <<- weight + weightdiff # Update error function if ((y - ypred) != 0) { err <- 1 } err } weight <- rep(0, 3) weightdiff <- rep(0, 3) f <- function() { t <- replicate(10, sum(unlist(lapply(seq_along(irissubdf$y), function(i) { faux(irissubdf[i, 1:2], weight, irissubdf$y[i], 1) })))) weight <<- rep(0, 3) t }
I did not expected any consistent improvement due to the aforementioned issues. But nevertheless I was really surprised when I saw the sharp worsening using lapply
and replicate
.
I obtained this results using microbenchmark
function from microbenchmark
library
What could possibly be the reasons? Could it be some memory leak?
expr min lq mean median uq f() 48670.878 50600.7200 52767.6871 51746.2530 53541.2440 perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 4184.131 4437.2990 4686.7506 4532.6655 4751.4795 perceptronC(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 95.793 104.2045 123.7735 116.6065 140.5545 max neval 109715.673 100 6513.684 100 264.858 100
The first function is the lapply
/replicate
function
The second is the function with for
loops
The third is the same function in C++
using Rcpp
Here According to Roland the profiling of the function. I am not sure I can interpret it in the right way. It looks like to me most of the time is spent in subsetting Function profiling
The apply functions do run a for loop in the background. However they often do it in the C programming language (which is used to build R). This does make the apply functions a few milliseconds faster than regular for loops. However, this is not the main reason to use apply functions!
vapply can be a bit faster because it already knows what format it should be expecting the results in.
Choose sapply() for other reasons. Main Point: Both sapply() and for() loops are much slower than vectorized code.
There is more to the question of when to use for
or lapply
and which "performs" better. Sometimes speed is important, other times memory is important. To further complicate things, the time complexity may not be what you expect - that is, different behavior can be observed at different scopes, invalidating any blanket statement such as "faster than" or "at least as fast as". Finally, one performance metric often overlooked is thought-to-code, pre-mature optimization yada yada.
That said, in the Introduction to R the authors hint at some performance concerns:
Warning: for() loops are used in R code much less often than in compiled languages. Code that takes a ‘whole object’ view is likely to be both clearer and faster in R.
Given a similar use case, input and output, disregarding user preferences, is one clearly better than the other?
I compare approaches to compute 1 to N Fibonacci numbers (inspired by the benchmarkme package), shunning the 2nd Circle and ensuring that inputs and outputs for each approach are the same. Four additional approaches are included to throw some oil on the fire - a vectorized approach and purrr::map
, and *apply
variants vapply
and sapply
.
fib <- function(x, ...){ x <- 1:x ; phi = 1.6180339887498949 ; v = \() vector("integer", length(x)) bench::mark( vector = { y=v(); y = ((rep(phi, length(x))^x) - ((-rep(phi, length(x)))^-x)) / sqrt(5); y}, lapply = { y=v(); y = unlist(lapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5)), use.names = F); y}, loop = { y=v(); `for`(i, x, {y[i] = (phi^i - (-phi)^(-i)) / sqrt(5)}); y}, sapply = { y=v(); y = sapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5)); y}, vapply = { y=v(); y = vapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5), 1); y}, map = { y=v(); y <- purrr::map_dbl(x, ~ (phi^. - (-phi)^(-.))/sqrt(5)); y }, ..., check = T )[c(1:9)] }
Here is a comparison of the performance, ranked by median time.
lapply(list(3e2, 3e3, 3e4, 3e5, 3e6, 3e7), fib) # n iterations specified separately N = 300 expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time 1 vector 38.8us 40.9us 21812. 8.44KB 0 1000 0 45.8ms 2 vapply 500us 545us 1653. 3.61KB 1.65 999 1 604ms 3 sapply 518us 556us 1725. 12.48KB 0 1000 0 580ms 4 lapply 513.4us 612.8us 1620. 6KB 8.14 995 5 614.2ms 5 loop 549.9us 633.6us 1455. 3.61KB 8.78 994 6 683.3ms 6 map 649.6us 754.6us 1312. 3.61KB 9.25 993 7 756.9ms N = 3000 1 vector 769.7us 781.5us 1257. 82.3KB 1.26 999 1 794.83ms 2 vapply 5.38ms 5.58ms 173. 35.2KB 0.697 996 4 5.74s 3 sapply 5.59ms 5.83ms 166. 114.3KB 0.666 996 4 6.01s 4 loop 5.38ms 5.91ms 167. 35.2KB 8.78 950 50 5.69s 5 lapply 5.24ms 6.49ms 156. 58.7KB 8.73 947 53 6.07s 6 map 6.11ms 6.63ms 148. 35.2KB 9.13 942 58 6.35s N = 30 000 1 vector 10.7ms 10.9ms 90.9 821KB 0.918 297 3 3.27s 2 vapply 57.3ms 60.1ms 16.4 351.66KB 0.741 287 13 17.5s 3 loop 59.2ms 60.7ms 15.9 352KB 16.7 146 154 9.21s 4 sapply 59.6ms 62.1ms 15.7 1.05MB 0.713 287 13 18.2s 5 lapply 57.3ms 67.6ms 15.1 586KB 20.5 127 173 8.43s 6 map 66.7ms 69.1ms 14.4 352KB 21.6 120 180 8.35s N = 300 000 1 vector 190ms 193ms 5.14 8.01MB 0.206 100 4 19.45s 2 loop 693ms 713ms 1.40 3.43MB 7.43 100 532 1.19m 3 map 766ms 790ms 1.26 3.43MB 7.53 100 598 1.32m 4 vapply 633ms 814ms 1.33 3.43MB 0.851 100 39 45.8s 5 lapply 685ms 966ms 1.06 5.72MB 9.13 100 864 1.58m 6 sapply 694ms 813ms 1.27 12.01MB 0.810 100 39 48.1s N = 3 000 000 1 vector 3.17s 3.21s 0.312 80.1MB 0.249 20 16 1.07m 2 vapply 8.22s 8.37s 0.118 34.3MB 4.97 20 845 2.83m 3 loop 8.3s 8.42s 0.119 34.3MB 4.35 20 733 2.81m 4 map 9.09s 9.17s 0.109 34.3MB 4.91 20 903 3.07m 5 lapply 10.42s 11.09s 0.0901 57.2MB 4.10 20 909 3.7m 6 sapply 10.43s 11.28s 0.0862 112.1MB 3.58 20 830 3.87m N = 30 000 000 1 vector 44.8s 45.94s 0.0214 801MB 0.00854 10 4 7.8m 2 vapply 1.56m 1.6m 0.0104 343MB 0.883 10 850 16m 3 loop 1.56m 1.62m 0.00977 343MB 0.366 10 374 17.1m 4 map 1.72m 1.74m 0.00959 343MB 1.23 10 1279 17.4m 5 lapply 2.15m 2.22m 0.00748 572MB 0.422 10 565 22.3m 6 sapply 2.05m 2.25m 0.00747 1.03GB 0.405 10 542 22.3m # Intel i5-8300H CPU @ 2.30GHz / R version 4.1.1 / purrr 0.3.4
for
and lapply
approaches perform similarly, but lapply
is greedier when it comes to memory, and a bit slower when the size of input increases (for this task). Note that purrr::map
memory usage is equivalent to the for-loop
, superior to that of lapply
, in itself a debated topic. However, when the appropriate *apply*
is used, here vapply
, the performance is similar. But the choice could have a large impact on memory use, sapply
being noticeably less memory efficient than vapply
.
A peek under the hood reveals the reason of different performance for the approaches. The for-loop performs many type checks, resulting in some overhead. lapply on the other hand, suffers from a flawed language design where lazy evaluation, or use of promises, comes at a cost, the source code confirming that the X
and FUN
arguments to .Internal(lapply)
are promises.
Vectorized approaches are fast, and probably desirable over a for
or lapply
approach. Notice how the vectorized approach grows irregularly compared to the other approaches. However, aesthetics of vectorized code may be a concern: which approach would you prefer to debug?
Overall, I'd say a choice between lapply
or for
is not something the average R user should ponder over. Stick to what is easiest to write, think of, and debug or that is less (silent?) error prone. What is lost in performance will likely be canceled out by time saved writing. For performance critical applications, make sure to run some tests with different input sizes and to properly chunk code.
First of all, it is an already long debunked myth that for
loops are any slower than lapply
. The for
loops in R have been made a lot more performant and are currently at least as fast as lapply
.
That said, you have to rethink your use of lapply
here. Your implementation demands assigning to the global environment, because your code requires you to update the weight during the loop. And that is a valid reason to not consider lapply
.
lapply
is a function you should use for its side effects (or lack of side effects). The function lapply
combines the results in a list automatically and doesn't mess with the environment you work in, contrary to a for
loop. The same goes for replicate
. See also this question:
Is R's apply family more than syntactic sugar?
The reason your lapply
solution is far slower, is because your way of using it creates a lot more overhead.
replicate
is nothing else but sapply
internally, so you actually combine sapply
and lapply
to implement your double loop. sapply
creates extra overhead because it has to test whether or not the result can be simplified. So a for
loop will be actually faster than using replicate
.lapply
anonymous function, you have to access the dataframe for both x and y for every observation. This means that -contrary to in your for-loop- eg the function $
has to be called every time.for
solution that only calls 26. These extra functions for the lapply
solution include calls to functions like match
, structure
, [[
, names
, %in%
, sys.call
, duplicated
, ... All functions not needed by your for
loop as that one doesn't do any of these checks.If you want to see where this extra overhead comes from, look at the internal code of replicate
, unlist
, sapply
and simplify2array
.
You can use the following code to get a better idea of where you lose your performance with the lapply
. Run this line by line!
Rprof(interval = 0.0001) f() Rprof(NULL) fprof <- summaryRprof()$by.self Rprof(interval = 0.0001) perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) Rprof(NULL) perprof <- summaryRprof()$by.self fprof$Fun <- rownames(fprof) perprof$Fun <- rownames(perprof) Selftime <- merge(fprof, perprof, all = TRUE, by = 'Fun', suffixes = c(".lapply",".for")) sum(!is.na(Selftime$self.time.lapply)) sum(!is.na(Selftime$self.time.for)) Selftime[order(Selftime$self.time.lapply, decreasing = TRUE), c("Fun","self.time.lapply","self.time.for")] Selftime[is.na(Selftime$self.time.for),]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With