Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a sub-class of data.frame with additional features

Tags:

r

s4

I want to create a class that is pretty much a data frame, with a couple of enhancements (extra functions, extra properties), and am wondering what the best way to do it is. The class is basically a data frame, but with some additional attributes such as the schema of that data frame (named "form" below, auto-derived, represented as a data frame, used to cast the data frame into the right types), and a couple of other things. When users use this object in other functions that do not recognize its special type, I want them to deal with the data.frame part of the object. What is the best way to do this?

The two methods I have found are both unsatisfactory; I list them and what issues I still see and am trying to solve; question is: what is the best way to do what I'm trying to do?

Method 1, use "data.frame" as "base" slot (inspired by this SO post)

setClass("formhubData", representation(form="data.frame"), contains="data.frame")
fd <- new('formhubData', data.frame(x=c(1,2)), form=data.frame(name='x', type='select one', label='X'))

This method allows me to do things like:

fd$x                  >> 1 2
names(fd)             >> "x"

[Update: turns out the "break down" was being caused by my environment, in which I was calling setClass('formhubData', ...) repeatedly with different arguments. In a fresh R session, all of the below functions work as expected.]

But it breaks down pretty quickly:

nrow(fd)              >> NULL
colnames(fd)          >> NULL

Unlike the post linked above, even the simple is.data.frame doesn't work for me

is.data.frame         >> FALSE

Method 2, use "data" slot (inspired by SP)

setClass("formhubData", representation(data="data.frame", form="data.frame"))
fd <- new('formhubData', data=data.frame(x=c(1,2)), form=data.frame(name='x', type='select one', label='X'))

I lose the default definitions:

fd$x             >> NULL
names(fd)        >> integer(0)

But, at least I can re-define most of them (still have to learn about [, [[, etc.):

 dim.formhubData <- function(x) dim(x@data)
 names.formhubData <- function(x) names(x@data)
 nrow(fd)        >> 2
 names(fd)       >> "x"

However, it seems like that I can't express the fact that for any method that takes a data.frame, my class should be used as a passthrough to its @data slot. I feel the need for something like *.formhubData <- function(x, ...) *(x, ...) rather than trying to guess all the functions that the clients of my class might use, and define them like dim.formhubData, names.formhubData, etc.

Are there any ways to achieve something like this?

like image 762
prabhasp Avatar asked May 05 '13 18:05

prabhasp


1 Answers

While both approaches work to some extent, I'd actually suggest method 2. 'Standard' object-oriented considerations about 'is-a' versus 'has-a' designs generally fall out in favor of 'has-a'. Further, in R methods can be added to objects at any time, so in some ways 'is-a' is advertising that it makes sense to do any number of perhaps arbitrary things to your class. This is a hard contract to fulfill, even for defined functions like sub-setting -- presumably if the user drops / adds rows or columns to the underlying data in formhubData, you'd like to update the information in form.

Instead, it seems like you'd really like to implement a 'has-a' relationship, and use the opportunity to restrict the interface to operations that make sense. You can still get substantial code re-use with minimal new code by simple dispatch to underlying implementations, e.g.,

setMethod(dim, "formhubData", function(x) dim(x@data)

gives you nrow and ncol, for instance. For common operations (e.g., subsetting), you'd like to provide implementations that respect the integrity of your data structure. And if it really is the case that the user should be able to do pretty much arbitrary things, you can provide simple 'accessors' to data, perhaps using the setter to do whatever is required to bring the form field into line with the updated data.frame provided by the user.

like image 188
Martin Morgan Avatar answered Oct 28 '22 07:10

Martin Morgan