Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define the subset operators for a S4 class?

Tags:

oop

r

subset

s4

I am having trouble figuring out the proper way to define the [, $, and [[ subset operators for an S4 class.

Can anyone provide me with a basic example of defining these three for an S4 class?

like image 909
Kyle Brandt Avatar asked Jun 09 '12 14:06

Kyle Brandt


People also ask

How do you create S4 objects How can you check an object is S4 object?

Example 2: Creation of S4 object We can check if an object is an S4 object through the function isS4() . The function setClass() returns a generator function. This generator function (usually having same name as the class) can be used to create new objects. It acts as a constructor.

What are S4 classes?

▶ The S4 class system is a set of facilities provided in R for OO programming. ▶ R also supports an older class system: the S3 class system. like in other OO programming languages.

What is S3 and S4 class in R?

The S3 and S4 software in R are two generations implementing functional object-oriented programming. S3 is the original, simpler for initial programming but less general, less formal and less open to validation. The S4 formal methods and classes provide these features but require more programming.

Does R have multiple dispatch?

Since you can do all of this in any language – even C – it's arguable that multiple dispatch isn't really part of "the R language" at all, but rather "the R system" happens to ship with a standard implementation of hash-based multiple dispatch and a few of the built-in function like "show" and "plot" are defined using ...


2 Answers

Discover the generic so that we know what we are aiming for

> getGeneric("[")
standardGeneric for "[" defined from package "base"

function (x, i, j, ..., drop = TRUE) 
standardGeneric("[", .Primitive("["))
<bytecode: 0x32e25c8>
<environment: 0x32d7a50>
Methods may be defined for arguments: x, i, j, drop
Use  showMethods("[")  for currently available ones.

Define a simple class

setClass("A", representation=representation(slt="numeric"))

and implement a method

setMethod("[", c("A", "integer", "missing", "ANY"),
    ## we won't support subsetting on j; dispatching on 'drop' doesn't
    ## make sense (to me), so in rebellion we'll quietly ignore it.
    function(x, i, j, ..., drop=TRUE)
{
    ## less clever: update slot, return instance
    ## x@slt = x@slt[i]
    ## x
    ## clever: by default initialize is a copy constructor, too
    initialize(x, slt=x@slt[i])
})

In action:

> a = new("A", slt=1:5)
> a[3:1]
An object of class "A"
Slot "slt":
[1] 3 2 1

There are different strategies for supporting the (implicitly) many signatures, for instance you'd likely also want to support logical and character index values, possibly for both i and j. The most straight-forward is a "facade" pattern where each method does some preliminary coercion to a common type of subset index, e.g., integer to allow for re-ordering and repetition of index entries, and then uses callGeneric to invoke a single method that does the work of subsetting the class.

There are no conceptual differences for [[, other than wanting to respect the semantics of returning the content rather than another instance of the object as implied by [. For $ we have

> getGeneric("$")
standardGeneric for "$" defined from package "base"

function (x, name) 
standardGeneric("$", .Primitive("$"))
<bytecode: 0x31fce40>
<environment: 0x31f12b8>
Methods may be defined for arguments: x
Use  showMethods("$")  for currently available ones.

and

setMethod("$", "A",
    function(x, name)
{
    ## 'name' is a character(1)
    slot(x, name)
})

with

> a$slt
[1] 1 2 3 4 5
like image 92
Martin Morgan Avatar answered Oct 20 '22 17:10

Martin Morgan


I would do as @Martin_Morgan suggested for the operators you mentioned. I would add a couple of points though:

1) I would be careful about defining a $ operator to access an S4 slot (unless you intend to access a column from a data frame which is stored in a specific slot?). The general suggestion is to write accessor functions like getMySlot() and setMySlot() to get the information you need. You can use the @ operator to access data from those slots, although get and set are best as a user interface. Using $ could be confusing for the user, who would probably expect a data.frame. See this S4 tutorial by Christophe Genolini for an in-depth discussion of these issues. If this is not how you intended to use $, disregard my suggestion (but the tutorial is still a great resource!).

2) If you are defining [ and [[ to inherit from another class, like vector, you will also want to define el() (equivalent to [][[1L]], or the first element from a subset []) and length(). I am currently writing a class to inherit from numeric, and numeric methods will automatically try to use these functions from your class. If the class is for a more limited or your own personal use, this may not be a problem.

I apologize, I would have left this as a comment, but I'm new to SO and I don't have the rep yet!

like image 37
Eli Sander Avatar answered Oct 20 '22 19:10

Eli Sander