Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is S4 method dispatch slow?

Tags:

r

dispatch

s4

My S4 class has a method that is called many times. I noticed that the execution time is much slower than it would be if a similar function was called independently. So I added a slot with type "function" to my class and used that function instead of the method. The example below shows two ways of doing this, and both of them run much faster than the corresponding method. Also, the example suggests that the lower speed of the method is not due to method having to retrieve data from the class, since the functions are faster even when they also do that.

Of course, this way of doing things is not ideal. I wonder if there is a way to accelerate method dispatch. Any suggestions?

    setClass(Class = "SpeedTest", 
      representation = representation(
        x = "numeric",
        foo1 = "function",
        foo2 = "function"
      )
    )

    speedTest <- function(n) {
      new("SpeedTest",
        x = rnorm(n),
        foo1 = function(z) sqrt(abs(z)),
        foo2 = function() {}
      )
    }

    setGeneric(
      name = "method.foo",
      def = function(object) {standardGeneric("method.foo")}
    )
    setMethod(
      f = "method.foo", 
      signature = "SpeedTest",
      definition = function(object) {
        sqrt(abs(object@x))
      }
    )

    setGeneric(
      name = "create.foo2",
      def = function(object) {standardGeneric("create.foo2")}
    )
    setMethod(
      f = "create.foo2", 
      signature = "SpeedTest",
      definition = function(object) {
        z <- object@x
        object@foo2 <- function() sqrt(abs(z))

        object
      }
    )

    > st <- speedTest(1000)
    > st <- create.foo2(st)
    > 
    > iters <- 100000
    > 
    > system.time(for (i in seq(iters)) method.foo(st)) # slowest by far
       user  system elapsed 
       3.26    0.00    3.27 

    > # much faster 
    > system.time({foo1 <- st@foo1; x <- st@x; for (i in seq(iters)) foo1(x)}) 
       user  system elapsed 
      1.47    0.00    1.46 

    > # retrieving st@x instead of x does not affect speed
    > system.time({foo1 <- st@foo1; for (i in seq(iters)) foo1(st@x)}) 
       user  system elapsed 
       1.47    0.00    1.49 

    > # same speed as foo1 although no explicit argument
    > system.time({foo2 <- st@foo2; for (i in seq(iters)) foo2()}) 
       user  system elapsed 
       1.44    0.00    1.45 

     # Cannot increase speed by using a lambda to "eliminate" the argument of method.foo
     > system.time({foo <- function() method.foo(st); for (i in seq(iters)) foo()})  
        user  system elapsed 
        3.28    0.00    3.29
like image 977
Soldalma Avatar asked May 05 '13 15:05

Soldalma


2 Answers

The cost is in method look-up, which starts from scratch in each iteration of your timing. This can be short-circuited by figuring out method dispatch once

METHOD <- selectMethod(method.foo, class(st))
for (i in seq(iters)) METHOD(st)

This (better method look-up) would be a very interesting and worth-while project; there are valuable lessons learned in other dynamic languages, e.g., inline caching mentioned on Wikipedia's dynamic dispatch page.

I wonder if the reason that you're making many method calls is because of incomplete vectorization of your data representation and methods?

like image 57
Martin Morgan Avatar answered Nov 26 '22 01:11

Martin Morgan


This doesn't help you directly with your problem, but it's much easier to benchmark this sort of stuff with the microbenchmark package:

f <- function(x) NULL

s3 <- function(x) UseMethod("s3")
s3.integer <- function(x) NULL

A <- setClass("A", representation(a = "list"))
setGeneric("s4", function(x) standardGeneric("s4"))
setMethod(s4, "A", function(x) NULL)

B <- setRefClass("B")
B$methods(r5 = function(x) NULL)

a <- A()
b <- B$new()

library(microbenchmark)
options(digits = 3)
microbenchmark(
  bare = NULL,
  fun = f(),
  s3 = s3(1L),
  s4 = s4(a),
  r5 = b$r5()
)
# Unit: nanoseconds
#  expr   min    lq median    uq   max neval
#  bare    13    20     22    29    36   100
#   fun   171   236    270   310   805   100
#    s3  2025  2478   2651  2869  8603   100
#    s4 10017 11029  11528 11905 36149   100
#    r5  9080 10003  10390 10804 61864   100

On my computer, the bare call takes about 20 ns. Wrapping it in a function adds about an extra 200 ns - this is the cost of creating the environment where the function execution happens. S3 method dispatch adds around 3 µs and S4/ref classes around 12 µs.

like image 32
hadley Avatar answered Nov 26 '22 00:11

hadley