Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "S3 methods" mean in R?

Tags:

oop

r

r-faq

r-s3

r-s4

People also ask

What is S3 and S4 method in R?

The S3 and S4 software in R are two generations implementing functional object-oriented programming. S3 is the original, simpler for initial programming but less general, less formal and less open to validation. The S4 formal methods and classes provide these features but require more programming.

What is S3 generic function?

S3 implements a style of object oriented programming called generic-function OO. This is different to most programming languages, like Java, C++ and C#, which implement message-passing OO. In message-passing style, messages (methods) are sent to objects and the object determines which function to call.

What is a method in R?

Description. R possesses a simple generic function mechanism which can be used for an object-oriented style of programming. Method dispatch takes place based on the class(es) of the first argument to the generic function or of the object supplied as an argument to UseMethod or NextMethod .

What is S4 type in R?

The S4 system in R is a system for object oriented programing. Confusingly, R has support for at least 3 different systems for object oriented programming: S3, S4 and S5 (also known as reference classes).


Most of the relevant information can be found by looking at ?S3 or ?UseMethod, but in a nutshell:

S3 refers to a scheme of method dispatching. If you've used R for a while, you'll notice that there are print, predict and summary methods for a lot of different kinds of objects.

In S3, this works by:

  • setting the class of objects of interest (e.g.: the return value of a call to method glm has class glm)
  • providing a method with the general name (e.g. print), then a dot, and then the classname (e.g.: print.glm)
  • some preparation has to have been done to this general name (print) for this to work, but if you're simply looking to conform yourself to existing method names, you don't need this (see the help I refered to earlier if you do).

To the eye of the beholder, and particularly, the user of your newly created funky model fitting package, it is much more convenient to be able to type predict(myfit, type="class") than predict.mykindoffit(myfit, type="class").

There is quite a bit more to it, but this should get you started. There are quite a few disadvantages to this way of dispatching methods based upon an attribute (class) of objects (and C purists probably lie awake at night in horror of it), but for a lot of situations, it works decently. With the current version of R, newer ways have been implemented (S4 and reference classes), but most people still (only) use S3.


To get you started with S3, look at the code for the median function. Typing median at the command prompt reveals that it has one line in its body, namely

UseMethod("median")

That means that it is an S3 method. In other words, you can have a different median function for different S3 classes. To list all the possible median methods, type

methods(median) #actually not that interesting.  

In this case, there's only one method, the default, which is called for anything. You can see the code for that by typing

median.default

A much more interesting example is the print function, which has many different methods.

methods(print)  #very exciting

Notice that some of the methods have *s next to their name. That means that they are hidden inside some package's namespace. Use find to find out which package they are in. For example

find("acf")  #it's in the stats package
stats:::print.acf

From http://adv-r.had.co.nz/OO-essentials.html:

R’s three OO systems differ in how classes and methods are defined:

  • S3 implements a style of OO programming called generic-function OO. This is different from most programming languages, like Java, C++ and C#, which implement message-passing OO. With message-passing, messages (methods) are sent to objects and the object determines which function to call. Typically, this object has a special appearance in the method call, usually appearing before the name of the method/message: e.g. canvas.drawRect("blue"). S3 is different. While computations are still carried out via methods, a special type of function called a generic function decides which method to call, e.g., drawRect(canvas, "blue"). S3 is a very casual system. It has no formal definition of classes.

  • S4 works similarly to S3, but is more formal. There are two major differences to S3. S4 has formal class definitions, which describe the representation and inheritance for each class, and has special helper functions for defining generics and methods. S4 also has multiple dispatch, which means that generic functions can pick methods based on the class of any number of arguments, not just one.

  • Reference classes, called RC for short, are quite different from S3 and S4. RC implements message-passing OO, so methods belong to classes, not functions. $ is used to separate objects and methods, so method calls look like canvas$drawRect("blue"). RC objects are also mutable: they don’t use R’s usual copy-on-modify semantics, but are modified in place. This makes them harder to reason about, but allows them to solve problems that are difficult to solve with S3 or S4.

There’s also one other system that’s not quite OO, but it’s important to mention here:

  • base types, the internal C-level types that underlie the other OO systems. Base types are mostly manipulated using C code, but they’re important to know about because they provide the building blocks for the other OO systems.

I came to this question mostly wondering where the names came from. It appears from this wikipedia article that the name refers to the version of the S Programming Language that R is based on. The method dispatching schemes described in the other answers come from S and are labelled appropriately according to version.


Try

methods(residuals)

which lists, among others, "residuals.lm" and "residuals.glm". This means when you have fitted a linear model, m, and type residuals(m), residuals.lm will be called. When you have fitted a generalized linear model, residuals.glm will be called. It's kind of the C++ object model turned upside down. In C++, you define a base class having virtual functions, which are overrided by derived classed. In R you define a virtual (aka generic) function and then you decide which classes will override this function (aka define a method). Note that the classes doing this do not need to be derived from one common super class. I would not agree to generally prefer S3 over S4. S4 has more formalism (= more typing) and this may be too much for some applications. S4 classes, however, can be de defined like a class or struct in C++. You can specify that an object of a certain class is made up of a string and two numbers for example:

setClass("myClass", representation(label = "character", x = "numeric", y = "numeric"))

Methods that are called with an object of that class can rely on the object having those members. That's very different from S3 classes, which are just a list of a bunch of elements.

With S3 and S4, you call a member function by fun(object, args) and not by object$fun(args). If you are looking for something like the latter, have a look at the proto package.


Here is an updated fast rundown of the numerous R object systems according to "Advanced R, 2nd edition" (CRC Press, 2019) by Hadley Wickham (Chief Scientist at RStudio), which has a web representation here, based on the chapter about Object-Oriented Programming.

Advanced R book cover

The first edition from 2015 has a web representation here, with the corresponding chapter on OO here.

Approaches to OO systems

Hadley defines the following to distinguish two distinct approaches to OO programming:

Functional OOP: methods (callable code pieces) belong to generic functions (not to be confused with Java/C# generic methods). Think of the methods as being located in a global lookup table. The method to execute is found by the runtime system based on the name of the function and the type (or object class) of one or more arguments passed to that function (this is called "method dispatch"). Syntax-wise, method calls may look like ordinary function calls: myfunc(object, arg1, arg2). This call would lead the runtime to look for the method associated to the pair ("myfunc", typeof(object)) or possibly ("myfunc", typeof(object), typeof(arg1), typeof(arg2)) if the language supports that. In R's S3, the full name of the generic function gives the (function-name, class) pair. For example: mean.Date is the method to compute the mean of Dates. Try methods("mean") to list the generic methods with function name mean. The Functional OOP approach is found for example in the OO pioneer Smalltalk, the Common Lisp Object System and Julia. Hadley notes that "Compared to R, Julia’s implementation is fully developed and extremely performant."

Encapsulated OOP: methods belong to objects or classes, and method calls typically look like object.method(arg1, arg2). This is called encapsulated because the object encapsulates both data (fields) and behaviour (methods). Think of the method as being located in a lookup table attached to the object or the object's class description. The runtime looks the method up based on method name and possibly the type of one or more arguments. This is the approach found in "popular" OO languages like C++, Java, C#.

In both cases, if inheritance is supported (it probably is), the runtime may traverse the class hierarchy upwards until it has found a match for the call lookup key.

How to find out what system an R object belongs to

library(sloop) # formerly, "pryr"
otype(mtcars)
#> [1] "S3"

The R object systems

S3

  • Functional OOP approach.
  • Most important system according to Hadley.
  • Simplest, most common. First OO system used by R.
  • Comes with base R, used throughout base R.
  • Relies on conventions rather than enforced guarantees.
  • See Chambers, John M, and Trevor J Hastie. 1992. "Statistical Models in S." Wadsworth & Brooks/Cole Advanced Books & Software.
  • Details in "Advanced R, 2nd edition" here.

S4

  • Functional OOP approach.
  • Third most important system according to Hadley.
  • Rewrite of S3, therefore similar to S3, but more formal and more strict: it forces you to think carefully about program design. Suited for building large systems (e.g. the Bioconductor project).
  • Implemented in the base "methods" package.
  • See: Chambers, John M. 1998. "Programming with Data: A Guide to the S Language." Springer.
  • Details in "Advanced R, 2nd edition" here.

RC aka "Reference Classes"

  • Encapsulated OOP approach.
  • Comes with base R.
  • Based on S4.
  • RC objects are special type of S4 objects that are also "mutable". i.e. instead of using R's usual copy-on-modify semantics, they can be modified in-place. Note that mutable state is hard to reason about and a source of ugly bugs but can lead to more efficient code in certain applications.

R6

  • Encapsulated OOP approach.
  • Second most important system according to Hadley.
  • Can be found in the R6 package (install with library(R6))
  • Similar to RC, but lighter & much faster: it does not depend on S4 or the methods package. Built on top of R environments. Also has:
    • public and private methods
    • active bindings (fields, that, when accessed, actually call a method)
    • class inhertance which works across packages
    • both class methods (code that belongs to class and can access an instance via self, private, super) and member functions (functions assigned to fields, but which are not methods, just functions)
  • Provides a standardised way to escape R's "copy-on-modify" semantics
  • See the package site: "R6: Encapsulated object-oriented programming for R".
  • Details in "Advanced R, 2nd edition" here.

Others

There are others, like R.oo (similar to RC), proto (prototype-based, think JavaScript) and Mutatr. However, "Advanced R" says:

Apart from R6, which is widely used, these systems are primarily of theoretical interest. They do have their strengths, but few R users know and understand them, so it is hard for others to read and contribute to your code.

Be sure to read the chapter on trade-offs in "Advanced R, 2nd edition", too.