Why is `unlist(lapply)` faster than `sapply`?

Tags:

2 Answers

In addition to running lapply, sapply runs simplify2array to try and fit the output into an array. To figure out if that is possible, the function needs to check if all the individual outputs have the same length: this is done via a costly unique(lapply(..., length)) which accounts for most of the time difference you were seeing:

Click to copy

b <- lapply(JSON, "[[", "a")

microbenchmark(lapply(JSON, "[[", "a"),
               unlist(b),
               unique(lapply(b, length)),
               sapply(JSON, "[[", "a"),
               sapply(JSON, "[[", "a", simplify = FALSE),
               unlist(lapply(JSON, "[[", "a"))
)

# Unit: microseconds
#                                       expr       min        lq   median        uq       max neval
#                    lapply(JSON, "[[", "a") 14809.151 15384.358 15774.26 16905.226 24944.863   100
#                                  unlist(b)   920.047  1043.719  1158.62  1223.091  8056.231   100
#                  unique(lapply(b, length)) 10778.065 11060.452 11456.11 12581.414 19717.740   100
#                    sapply(JSON, "[[", "a") 24827.206 25685.535 26656.88 30519.556 93195.751   100
#  sapply(JSON, "[[", "a", simplify = FALSE) 14283.541 14922.780 15526.42 16654.058 26865.022   100
#            unlist(lapply(JSON, "[[", "a")) 15334.026 16133.146 16607.12 18476.182 30080.544   100

128

answered Oct 13 '22 13:10

flodel

As droopy and Roland pointed out, sapply is a wrapper function for lapply designed for convenient use. sapply uses simplify2array which is slower than unlist:

Click to copy

> microbenchmark(unlist(as.list(1:1000)), simplify2array(as.list(1:1000)), times=1000)
Unit: microseconds
                            expr     min       lq  median       uq      max neval
         unlist(as.list(1:1000))  99.734 109.0230 113.912 118.3120 21343.92  1000
 simplify2array(as.list(1:1000)) 892.712 931.0895 947.957 976.3125 22241.52  1000

Also, when returning a matrix, sapply is slower than with other base functions, for example:

Click to copy

a <- list(c(1,2,3,4), c(1,2,3,4), c(1,2,3,4))
microbenchmark(t(do.call(rbind, lapply(a, function(x)x))), sapply(a, function(x)x))
Unit: microseconds
                                        expr    min     lq median     uq     max neval
 t(do.call(rbind, lapply(a, function(x) x))) 29.823 30.801 32.512 33.734  94.845   100
                    sapply(a, function(x) x) 57.201 58.179 59.156 60.134 111.956   100

But especially in the second case, sapply is much easier to use.

answered Oct 13 '22 13:10

user1981275

Related questions
                            
                                How to adjust title position in ggplot2
                            
                                Find the most frequent value by row
                            
                                How to use ggplot2 to generate a pie graph?
                            
                                Use R to convert PDF files to text files for text mining
                            
                                How keep information from shapefile after fortify()
                            
                                arrange() not working on grouped data frame
                            
                                Change label of gganimate frame title
                            
                                Unnesting a list of lists in a data frame column
                            
                                Produce a table spanning multiple pages using kable()
                            
                                Split a list column into multiple columns
                            
                                Coloring rows with kableExtra based on cell values
                            
                                A function to fill in a column with NA of the same type
                            
                                How to put a newline into a column header in an xtable in R
                            
                                Reorder factor levels by day of the week in R
                            
                                increasing the line thickness of geom_smooth
                            
                                stat_contour with data labels on lines
                            
                                R Equality while ignoring NAs
                            
                                Double quotes not escaped in R
                            
                                How to calculate BIC for k-means clustering in R
                            
                                Getting an error "(subscript) logical subscript too long" while training SVM from e1071 package in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is `unlist(lapply)` faster than `sapply`?

Tags:

r

colinfang

People also ask

2 Answers

flodel

user1981275

Recent Activity

Donate For Us