I've just noticed that classing a list, by adding an additional label to the class
attribute (S3) or defining a new parent class (S4) drastically slows down the basic lengths()
operation.
This suggests I should always unclass a "classed list" before calling lengths()
.
Can anyone
explain why this happens, and/or
suggest a better solution (or explain why this does not really matter since the differences are just microseconds in absolute terms).
Reproducible code:
# create a list of 1,000 elements with variable letter lengths
mylist <- list()
length(mylist) <- 1000
set.seed(99)
mylist <- lapply(mylist, function(x) sample(LETTERS, size = sample(1:100, size = 1),
replace = TRUE))
# create an S3 "classed" version
mylist_S3classed <- mylist
class(mylist_S3classed) <- c("myclass", "list")
# create an S4 classed version
setClass("mylist_S4class", contains = "list")
mylist_S4classed <- new("mylist_S4class", mylist)
# compare timings of lengths
microbenchmark::microbenchmark(lengths(mylist),
lengths(mylist_S3classed),
lengths(mylist_S4classed),
unit = "relative")
## Unit: relative
## expr min lq mean median uq max neval
## lengths(mylist) 1.0000 1.0000 1.0000 1.00000 1.00000 1.00000 100
## lengths(mylist_S3classed) 125.1433 119.3588 103.9747 91.90734 89.56034 291.97767 100
## lengths(mylist_S4classed) 162.4045 155.4870 119.0611 120.20908 111.95417 67.55309 100
## in absolute timings
microbenchmark::microbenchmark(lengths(mylist),
lengths(mylist_S3classed),
lengths(mylist_S4classed))
## Unit: microseconds
## expr min lq mean median uq max neval
## lengths(mylist) 6.401 6.9475 9.66612 9.4620 10.577 29.237 100
## lengths(mylist_S3classed) 792.738 851.0895 911.97067 898.0955 939.558 1604.189 100
## lengths(mylist_S4classed) 1050.448 1104.7920 1293.63965 1173.4545 1229.485 6431.130 100
This extra time is the time R takes to find the right length
function. For a plain old list, it's pretty easy and optimised, its probably stored right there in the object. Get it, return it.
For a classed object, be it S3 or S4, R has to find the right length
function because length
could be defined as a method. So R has to go on a hunt, and in your cases it looks everywhere until it falls through to the default. By the time its done that, its spent those milliseconds.
Don't go unclassing things to try and speed this up unless you can tell your future self that you'll never write a length
method on these objects, because your code will break...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With