Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does R copy unevaluated slots in S4 classes on assignment?

Tags:

r

s4

Suppose I have an S4 class with two slots. I then create a method that sets one of the slots to something and returns the result. Will the other slot also be copied on assignment?

For example,

setClass('foo', representation(first.slot = 'numeric', second.slot = 'numeric'))
setGeneric('setFirstSlot', function(object, value) {standardGeneric('setFirstSlot')})
setMethod('setFirstSlot', signature('foo', 'numeric'), function(object, value) {
 [email protected] = value
 return(object)
 }) 

 f <- new('foo')
 [email protected] <- 2
 f <- setFirstSlot(f, 1)

On the last line, will the values of both the first and second slot be copied or will there be some sort of optimization? I have a class with a field holding a gigabyte of data and a few fields with small numeric vectors, I'd like to have a setter function for the numeric fields that doesn't waste time needlessly copying the data every time it's used.

Thanks :)

like image 409
badmax Avatar asked Mar 17 '14 06:03

badmax


People also ask

How does the S4 class differ from S3 class?

There are mainly two major systems of OOP, which are described below: S3 Classes: These let you overload the functions. S4 Classes: These let you limit the data as it is quite difficult to debug the program.

What are S4 classes in R?

The S4 system in R is a system for object oriented programing. Confusingly, R has support for at least 3 different systems for object oriented programming: S3, S4 and S5 (also known as reference classes).

What are S3 and S4 objects in R?

Description. The S3 and S4 software in R are two generations implementing functional object-oriented programming. S3 is the original, simpler for initial programming but less general, less formal and less open to validation. The S4 formal methods and classes provide these features but require more programming.


2 Answers

If you are copying large amounts of data in a field, one solution is to use a reference class. Let's compare the reference classes to S4.

## Store timing output
m = matrix(0, ncol=4, nrow=6)

Create a function class definition:

foo_ref = setRefClass("test", fields = list(x = "numeric", y = "numeric"))

Then time data assignment:

## Reference class
g = function(x) {x$x[1] = 1; return(x)}
for(i in 6:8){
  f = foo_ref$new(x = 1, y = 1)
  y = runif(10^i)
  t1 = system.time({f$y <- y})[3]
  t2 = system.time({f$y[1] = 1})[3]
  t3 = system.time({f$x = 1})[3]
  t4 = system.time({g(f)})[3]
  m[i-5, ] = c(t1, t2, t3, t4)
}

We can repeat for a similar S4 structure:

g = function(x) {x@y[1] = 1; return(x)}
setClass('foo_s4', representation(x = 'numeric', y = 'numeric'))
for(i in 6:8){
  f = new('foo_s4'); f@x = 1; f@y = 1
  y = runif(10^i)
  t1 = system.time({f@y <- y})[3]
  t2 = system.time({f@y[1] <- 1})[3]
  t3 = system.time({f@x = 1})[3]
  t4 = system.time({g(f)})[3]
  m[i-2, ] = c(t1, t2, t3, t4)
}

Results

Assignment using a reference class structure for large data sets is much more efficient when dealing with functions.

enter image description here

Notes

  • Results for R version 3.1
  • For R < 3.1, t3 timings for S4 objects were higher.
like image 77
csgillespie Avatar answered Sep 21 '22 00:09

csgillespie


When the class is used by the developer (who knows the design of the class), using the assignment operator @<- instead of a setter method as setFirstSlot defined in the question may be better. The reason is that the former avoids returning the whole object.

However, setter methods are desirable to prevent users from trying assignments that do not match the definition of the slot in the class. I know that if we use @<- to assign a character to the slot x (which was defined as numeric) an error is returned.

setClass('foo', representation(x = 'numeric', y = 'numeric'))
f <- new('foo')
f@x <- 1 # this is ok
f@y <- 2 # this is ok
f@x <- "a"
#Error in checkAtAssignment("foo", "x", "character") : 
#  assignment of an object of class “character” is not valid for @‘x’ in an object of class “foo”; is(value, "numeric") is not TRUE

But imagine a situation where the slot should contain only one element. This requirement in the length of the slot is not caught by @<-:

# this assignment is allowed
f@x <- c(1, 2, 3, 4)
f@x
#[1] 1 2 3 4

In this situation we would like to define a setter method in order to inform the user about further restrictions in the definition of the slot. But then, we have to return the entire object and this may be an extra burden if the object is big.

As far as I know there is no way to define the length of a slot in its definition. The method setValidity could be defined in order to check this or other requirements in the slots, but it seems that @<- does not rely on validObject and the assignment f@x <- c(1, 2, 3, 4) would be allowed even if we define setValidity:

valid.foo <- function(object)
{
  if (length(object@x) > 1)
    stop("slot ", sQuote("x"), " must be of length 1")
}
setValidity("foo", valid.foo)
# no error is detected and the assignment is allowed
f@x <- c(1, 4, 6)
f@x
#[1] 1 4 6
# we need to call "validObject" to check if everything is correct
validObject(f)
#Error in validityMethod(object) : slot ‘x’ must be of length 1

A possible solution is to modify the object in-place. The method set.x.inplace below is based on this approach.

setGeneric("set.x.inplace", function(object, val){ standardGeneric("set.x.inplace") })
setMethod("set.x.inplace", "foo", function(object, val)
{
  if (length(val) == 1) {
    eval(eval(substitute(expression(object@x <<- val))))
  } else
    stop("slot ", sQuote("x"), " must be of length 1")
  #return(object) # not necessary
})

set.x.inplace(f, 6)
f
#An object of class "foo"
#Slot "x":
#[1] 6
#Slot "y":
#[1] 2

# the assignment is not allowed
set.x.inplace(f, c(1,2,3))
#Error in set.x.inplace(f, c(1, 2, 3)) : slot ‘x’ must be of length 1

As this method does not perform a return operation, it can be a good alternative with objects of large size.

like image 32
javlacalle Avatar answered Sep 22 '22 00:09

javlacalle