The S3 and S4 software in R are two generations implementing functional object-oriented programming. S3 is the original, simpler for initial programming but less general, less formal and less open to validation. The S4 formal methods and classes provide these features but require more programming.
S3 implements a style of object oriented programming called generic-function OO. This is different to most programming languages, like Java, C++ and C#, which implement message-passing OO. In message-passing style, messages (methods) are sent to objects and the object determines which function to call.
Description. R possesses a simple generic function mechanism which can be used for an object-oriented style of programming. Method dispatch takes place based on the class(es) of the first argument to the generic function or of the object supplied as an argument to UseMethod or NextMethod .
The S4 system in R is a system for object oriented programing. Confusingly, R has support for at least 3 different systems for object oriented programming: S3, S4 and S5 (also known as reference classes).
Most of the relevant information can be found by looking at ?S3
or ?UseMethod
, but in a nutshell:
S3 refers to a scheme of method dispatching. If you've used R for a while, you'll notice that there are print
, predict
and summary
methods for a lot of different kinds of objects.
In S3, this works by:
glm
has class glm
)print
), then a dot, and
then the classname (e.g.:
print.glm
)print
)
for this to work, but if you're
simply looking to conform yourself to
existing method names, you don't need
this (see the help I refered to
earlier if you do).To the eye of the beholder, and particularly, the user of your newly created funky model fitting package, it is much more convenient to be able to type predict(myfit, type="class")
than predict.mykindoffit(myfit, type="class")
.
There is quite a bit more to it, but this should get you started. There are quite a few disadvantages to this way of dispatching methods based upon an attribute (class) of objects (and C purists probably lie awake at night in horror of it), but for a lot of situations, it works decently. With the current version of R, newer ways have been implemented (S4 and reference classes), but most people still (only) use S3.
To get you started with S3, look at the code for the median
function. Typing median
at the command prompt reveals that it has one line in its body, namely
UseMethod("median")
That means that it is an S3 method. In other words, you can have a different median
function for different S3 classes. To list all the possible median methods, type
methods(median) #actually not that interesting.
In this case, there's only one method, the default, which is called for anything. You can see the code for that by typing
median.default
A much more interesting example is the print
function, which has many different methods.
methods(print) #very exciting
Notice that some of the methods have *
s next to their name. That means that they are hidden inside some package's namespace. Use find
to find out which package they are in. For example
find("acf") #it's in the stats package
stats:::print.acf
From http://adv-r.had.co.nz/OO-essentials.html:
R’s three OO systems differ in how classes and methods are defined:
S3 implements a style of OO programming called generic-function OO. This is different from most programming languages, like Java, C++ and C#, which implement message-passing OO. With message-passing, messages (methods) are sent to objects and the object determines which function to call. Typically, this object has a special appearance in the method call, usually appearing before the name of the method/message: e.g. canvas.drawRect("blue"). S3 is different. While computations are still carried out via methods, a special type of function called a generic function decides which method to call, e.g., drawRect(canvas, "blue"). S3 is a very casual system. It has no formal definition of classes.
S4 works similarly to S3, but is more formal. There are two major differences to S3. S4 has formal class definitions, which describe the representation and inheritance for each class, and has special helper functions for defining generics and methods. S4 also has multiple dispatch, which means that generic functions can pick methods based on the class of any number of arguments, not just one.
Reference classes, called RC for short, are quite different from S3 and S4. RC implements message-passing OO, so methods belong to classes, not functions. $ is used to separate objects and methods, so method calls look like canvas$drawRect("blue"). RC objects are also mutable: they don’t use R’s usual copy-on-modify semantics, but are modified in place. This makes them harder to reason about, but allows them to solve problems that are difficult to solve with S3 or S4.
There’s also one other system that’s not quite OO, but it’s important to mention here:
- base types, the internal C-level types that underlie the other OO systems. Base types are mostly manipulated using C code, but they’re important to know about because they provide the building blocks for the other OO systems.
I came to this question mostly wondering where the names came from. It appears from this wikipedia article that the name refers to the version of the S Programming Language that R is based on. The method dispatching schemes described in the other answers come from S and are labelled appropriately according to version.
Try
methods(residuals)
which lists, among others, "residuals.lm" and "residuals.glm". This means when you have fitted a linear model, m, and type residuals(m)
, residuals.lm will be called. When you have fitted a generalized linear model, residuals.glm will be called.
It's kind of the C++ object model turned upside down. In C++, you define a base class having virtual functions, which are overrided by derived classed.
In R you define a virtual (aka generic) function and then you decide which classes will override this function (aka define a method). Note that the classes doing this do not need to be derived from one common super class.
I would not agree to generally prefer S3 over S4. S4 has more formalism (= more typing) and this may be too much for some applications. S4 classes, however, can be de defined like a class or struct in C++. You can specify that an object of a certain class is made up of a string and two numbers for example:
setClass("myClass", representation(label = "character", x = "numeric", y = "numeric"))
Methods that are called with an object of that class can rely on the object having those members. That's very different from S3 classes, which are just a list of a bunch of elements.
With S3 and S4, you call a member function by fun(object, args)
and not by object$fun(args)
. If you are looking for something like the latter, have a look at the proto package.
Here is an updated fast rundown of the numerous R object systems according to "Advanced R, 2nd edition" (CRC Press, 2019) by Hadley Wickham (Chief Scientist at RStudio), which has a web representation here, based on the chapter about Object-Oriented Programming.
The first edition from 2015 has a web representation here, with the corresponding chapter on OO here.
Hadley defines the following to distinguish two distinct approaches to OO programming:
Functional OOP: methods (callable code pieces) belong to generic functions (not to be confused with Java/C# generic methods). Think of the methods as being located in a global lookup table. The method to execute is found by the runtime system based on the name of the function and the type (or object class) of one or more arguments passed to that function (this is called "method dispatch"). Syntax-wise, method calls may look like ordinary function calls: myfunc(object, arg1, arg2)
. This call would lead the runtime to look for the method associated to the pair ("myfunc", typeof(object)) or possibly ("myfunc", typeof(object), typeof(arg1), typeof(arg2)) if the language supports that. In R's S3, the full name of the generic function gives the (function-name, class) pair. For example: mean.Date
is the method to compute the mean of Dates. Try methods("mean")
to list the generic methods with function name mean
. The Functional OOP approach is found for example in the OO pioneer Smalltalk, the Common Lisp Object System and Julia. Hadley notes that "Compared to R, Julia’s implementation is fully developed and extremely performant."
Encapsulated OOP: methods belong to objects or classes, and method calls typically look like object.method(arg1, arg2)
. This is called encapsulated because the object encapsulates both data (fields) and behaviour (methods). Think of the method as being located in a lookup table attached to the object or the object's class description. The runtime looks the method up based on method name and possibly the type of one or more arguments. This is the approach found in "popular" OO languages like C++, Java, C#.
In both cases, if inheritance is supported (it probably is), the runtime may traverse the class hierarchy upwards until it has found a match for the call lookup key.
library(sloop) # formerly, "pryr"
otype(mtcars)
#> [1] "S3"
library(R6)
)self
, private
, super
) and member functions (functions assigned to fields, but which are not methods, just functions)There are others, like R.oo (similar to RC), proto (prototype-based, think JavaScript) and Mutatr. However, "Advanced R" says:
Apart from R6, which is widely used, these systems are primarily of theoretical interest. They do have their strengths, but few R users know and understand them, so it is hard for others to read and contribute to your code.
Be sure to read the chapter on trade-offs in "Advanced R, 2nd edition", too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With