Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is `unlist(lapply)` faster than `sapply`?

Tags:

r

If so, why do we need sapply?

x <- list(a=1, b=1)
y <- list(a=1)
JSON <- rep(list(x,y),10000)
microbenchmark(sapply(JSON, function(x) x$a),
               unlist(lapply(JSON, function(x) x$a)),
               sapply(JSON, "[[", "a"),
               unlist(lapply(JSON, "[[", "a"))
               )

Unit: milliseconds
                                  expr      min       lq   median       uq      max neval
         sapply(JSON, function(x) x$a) 25.22623 28.55634 29.71373 31.76492 88.26514   100
 unlist(lapply(JSON, function(x) x$a)) 17.85278 20.25889 21.61575 22.67390 78.54801   100
               sapply(JSON, "[[", "a") 18.85529 20.06115 21.53790 23.42480 38.56610   100
       unlist(lapply(JSON, "[[", "a")) 11.33859 11.69198 12.25329 13.37008 27.81361   100
like image 639
colinfang Avatar asked Sep 12 '13 10:09

colinfang


People also ask

Is Lapply faster than for loop in R?

The for loops in R have been made a lot more performant and are currently at least as fast as lapply .

Why is apply faster than for loop?

The apply() function loops over the DataFrame in a specific axis, i.e., it can either loop over columns(axis=1) or loop over rows(axis=0). apply() is better than iterrows() since it uses C extensions for Python in Cython. We are now in microseconds, making out loop faster by ~1900 times the naive loop in time.

Is apply better than for loop?

The apply functions do run a for loop in the background. However they often do it in the C programming language (which is used to build R). This does make the apply functions a few milliseconds faster than regular for loops.


2 Answers

In addition to running lapply, sapply runs simplify2array to try and fit the output into an array. To figure out if that is possible, the function needs to check if all the individual outputs have the same length: this is done via a costly unique(lapply(..., length)) which accounts for most of the time difference you were seeing:

b <- lapply(JSON, "[[", "a")

microbenchmark(lapply(JSON, "[[", "a"),
               unlist(b),
               unique(lapply(b, length)),
               sapply(JSON, "[[", "a"),
               sapply(JSON, "[[", "a", simplify = FALSE),
               unlist(lapply(JSON, "[[", "a"))
)

# Unit: microseconds
#                                       expr       min        lq   median        uq       max neval
#                    lapply(JSON, "[[", "a") 14809.151 15384.358 15774.26 16905.226 24944.863   100
#                                  unlist(b)   920.047  1043.719  1158.62  1223.091  8056.231   100
#                  unique(lapply(b, length)) 10778.065 11060.452 11456.11 12581.414 19717.740   100
#                    sapply(JSON, "[[", "a") 24827.206 25685.535 26656.88 30519.556 93195.751   100
#  sapply(JSON, "[[", "a", simplify = FALSE) 14283.541 14922.780 15526.42 16654.058 26865.022   100
#            unlist(lapply(JSON, "[[", "a")) 15334.026 16133.146 16607.12 18476.182 30080.544   100
like image 128
flodel Avatar answered Oct 13 '22 13:10

flodel


As droopy and Roland pointed out, sapply is a wrapper function for lapply designed for convenient use. sapply uses simplify2array which is slower than unlist:

> microbenchmark(unlist(as.list(1:1000)), simplify2array(as.list(1:1000)), times=1000)
Unit: microseconds
                            expr     min       lq  median       uq      max neval
         unlist(as.list(1:1000))  99.734 109.0230 113.912 118.3120 21343.92  1000
 simplify2array(as.list(1:1000)) 892.712 931.0895 947.957 976.3125 22241.52  1000

Also, when returning a matrix, sapply is slower than with other base functions, for example:

a <- list(c(1,2,3,4), c(1,2,3,4), c(1,2,3,4))
microbenchmark(t(do.call(rbind, lapply(a, function(x)x))), sapply(a, function(x)x))
Unit: microseconds
                                        expr    min     lq median     uq     max neval
 t(do.call(rbind, lapply(a, function(x) x))) 29.823 30.801 32.512 33.734  94.845   100
                    sapply(a, function(x) x) 57.201 58.179 59.156 60.134 111.956   100

But especially in the second case, sapply is much easier to use.

like image 10
user1981275 Avatar answered Oct 13 '22 13:10

user1981275