I have an R package which currently uses S3
class system, with two different classes and several methods for generic S3 functions like plot
, logLik
and update
(for model formula updating). As my code has become more complex with all the validity checking and if/else
structures due to to the fact that there's no inheritance or dispatching based on two arguments in S3
, I have started to think of converting my package to S4
. But then I started to read about the advantages and and disadvantages of S3
versus S4
, and I'm not so sure anymore. I found R-bloggers blog post about efficiency issues in S3 vs S4, and as that was 5 years ago, I tested the same thing now:
library(microbenchmark)
setClass("MyClass", representation(x="numeric"))
microbenchmark(structure(list(x=rep(1, 10^7)), class="MyS3Class"),
new("MyClass", x=rep(1, 10^7)) )
Unit: milliseconds
expr
structure(list(x = rep(1, 10^7)), class = "MyS3Class")
new("MyClass", x = rep(1, 10^7))
min lq median uq max neval
148.75049 152.3811 155.2263 159.8090 323.5678 100
75.15198 123.4804 129.6588 131.5031 241.8913 100
So in this simple example, S4
was actually bit faster. Then I read SO question about using S3
vs S4
, which was quite much in favor of S3
. Especially @joshua-ulrich 's answer made me doubt against S4
, as it said that
any slot change requires a full object copy
That feels like a big issue if I consider my case where I'm updating my object in every iteration when optimizing log-likelihood of my model. After some googling I found John Chambers post about this issue, which seems to be changing in R 3.0.0.
So although I feel it would be beneficial to use S4
classes for some clarity in my codes (for example more classes inheriting from the main model class), and for the validity checks etc, I am now wondering is it worth all the work in terms of performance? So, performance wise, is there real performance differences between S3
and S4
? Is there some other performance issues I should be considering? Or is it even possible to say something about this issue in general?
EDIT: As @DWin and @g-grothendieck suggested, the above benchmarking doesn't consider the case where the slot of an existing object is altered. So here's another benchmark which is more relevant to the true application (the functions in the example could be get/set functions for some elements in the model, which are altered when maximizing the log-likelihood):
objS3<-structure(list(x=rep(1, 10^3), z=matrix(0,10,10), y=matrix(0,10,10)),
class="MyS3Class")
fnS3<-function(obj,a){
obj$y<-a
obj
}
setClass("MyClass", representation(x="numeric",z="matrix",y="matrix"))
objS4<-new("MyClass", x=rep(1, 10^3),z=matrix(0,10,10),y=matrix(0,10,10))
fnS4<-function(obj,a){
obj@y<-a
obj
}
a<-matrix(1:100,10,10)
microbenchmark(fnS3(objS3,a),fnS4(objS4,a))
Unit: microseconds
expr min lq median uq max neval
fnS3(objS3, a) 6.531 7.464 7.932 9.331 26.591 100
fnS4(objS4, a) 21.459 22.393 23.325 23.792 73.708 100
The benchmarks are performed on R 2.15.2, on 64bit Windows 7. So here S4
is clearly slower.
There are mainly two major systems of OOP, which are described below: S3 Classes: These let you overload the functions. S4 Classes: These let you limit the data as it is quite difficult to debug the program.
Description. The S3 and S4 software in R are two generations implementing functional object-oriented programming. S3 is the original, simpler for initial programming but less general, less formal and less open to validation. The S4 formal methods and classes provide these features but require more programming.
The S4 system in R is a system for object oriented programing. Confusingly, R has support for at least 3 different systems for object oriented programming: S3, S4 and S5 (also known as reference classes).
First of all, you can easily have S3 methods for S4 classes:
> extract <- function (x, ...) x@x
> setGeneric ("extr4", def=function (x, ...){})
[1] "extr4"
> setMethod ("extr4", signature= "MyClass", definition=extract)
[1] "extr4"
> `[.MyClass` <- extract
> `[.MyS3Class` <- function (x, ...) x$x
> microbenchmark (objS3[], objS4 [], extr4 (objS4), extract (objS4))
Unit: nanoseconds
expr min lq median uq max neval
objS3[] 6775 7264.5 7578.5 8312.0 39531 100
objS4[] 5797 6705.5 7124.0 7404.0 13550 100
extr4(objS4) 20534 21512.0 22106.0 22664.5 54268 100
extract(objS4) 908 1188.0 1328.0 1467.0 11804 100
edit: due to Hadley's comment, change the experiment to plot
:
> `plot.MyClass` <- extract
> `plot.MyS3Class` <- function (x, ...) x$x
> microbenchmark (plot (objS3), plot (objS4), extr4 (objS4), extract (objS4))
Unit: nanoseconds
expr min lq median uq max neval
plot(objS3) 28915 30172.0 30591 30975.5 1887824 100
plot(objS4) 25353 26121.0 26471 26960.0 411508 100
extr4(objS4) 20395 21372.5 22001 22385.5 31359 100
extract(objS4) 979 1328.0 1398 1677.0 3982 100
for an S4 method for plot
I get:
plot(objS4) 19835 20428.5 21336.5 22175.0 58876 100
So yes, [
has an exceptionally fast dispatch mechanism (which is good, because I think extraction and the corresponding replacement functions are among the most frequently called methods. But no, S4 dispatch isn't slower than S3 dispatch.
Here the S3 method on the S4 object is as fast as the S3 method on the S3 object. However, calling without dispatch is still faster.
there are some things that work much better as S3 such as as.matrix
or as.data.frame
For some reason, defining these as S3 means that e.g. lm (formula, objS4)
will work out of the box. This doesn't work with as.data.frame
being defined as S4 method.
Also it is much more convenient to call debug
on a S3 method.
some other things will not work with S3, e.g. dispatching on the second argument.
Whether there will be any noticable drop in performance obviously depends on your class, that is, what kind of structures you have, how large the objects are and how often methods are called. A few μs of method dispatch won't matter with a calculation of ms or even s. But μs do matter when a function is called billions of times.
One thing that caused noticable performance drop for some functions that are called often ([
) is S4 validation (a fair number of checks done in validObject
) - however, I'm glad to have it, so I use it.Internally I use workhorse functions that skip this step.
In case you have large data and call-by-reference would help your performance, you may want to have a look at reference classes. I've never really worked with them so far, so I cannot comment on this.
If you are concerned about performance, benchmark it. If you really need multiple inheritance or multiple dispatch, use S4. Otherwise use S3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With