Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a dataframe of user defined S4 classes in R

Tags:

r

s4

I want to create a data.frame of different variables, including S4 classes. For a built-in class like "POSIXlt" (for dates) this works fine:

as.data.frame(list(id=c(1,2), 
                   date=c(as.POSIXlt('2013-01-01'),as.POSIXlt('2013-01-02'))

But now i have a user defined class, let's say a "Person" class with name and age:

setClass("person", representation(name="character", age="numeric"))

But the following fails:

as.data.frame(list(id=c(1,2), pers=c(new("person", name="John", age=20),
                                     new("person", name="Tom", age=30))))

I also tried to overload the [...]-Operator for the person class using

setMethod(
  f = "[",
  signature="person",
  definition=function(x,i,j,...,drop=TRUE){ 
    initialize(x, name=x@name[i], age = x@age[i])
  }
)

This allows for vector-like behavior:

persons = new("person", name=c("John","Tom"), age=c(20,30))
p1 = persons[1]

But still the following fails:

as.data.frame(list(id=c(1,2), pers=persons))

Perhaps I have to overload more operators to get the user defined class into a dataframe? I am sure, there must be a way to do this, as POSIXlt is an S4 class and it works! Any solution using the new R5 reference classes would be also fine!

I do not want to put all my data into the person class (You could ask, why "id" is not a member of person I just do not use dataframes)! The idea is that my data.frame represents a table from a database with many columns with different types, e.g., strings, numbers,... but also dates, intervals, geo-objects, etc... While for dates I already have a solution (POSIXlt), for intervals, geo-objects, etc. I probably need to specify my own S4/R5 classes.

Thanks a lot in advance.

like image 408
Patrick Roocks Avatar asked Jan 30 '13 12:01

Patrick Roocks


2 Answers

Judging by this thread on the mailing list:

http://tolstoy.newcastle.edu.au/R/e2/devel/06/11/1013.html

...John Chambers was thinking about this in 2006. And still we can't put S4 objects in columns of data frames. We also can't put complex S3 classes in columns of data frames neither.

There are some other tabular data structures that might do it - data.table perhaps:

require(data.table)
setClass("geezer", representation(name="character", age="numeric"))
tom=new("geezer",name="Tom",age=20)
dick=new("geezer",name="Dick",age=23)
harry=new("geezer",name="Harry",age=25)
gt = data.table(geezers=c(tom,dick,harry),weapons=c("Gun","Gun","Knife"))
gt
    geezers weapons
1: <geezer>     Gun
2: <geezer>     Gun
3: <geezer>   Knife

The semantics of data.table are a bit different to data.frame, and don't expect to be able to plug a data.table into any code that uses a data.frame and expect it to work (For example, I suspect lm and glm will go wobbly). But it seems the data.table authors allow compound classes in columns...

like image 39
Spacedman Avatar answered Oct 04 '22 03:10

Spacedman


Here's your class, with a "column" interpretation of its definition, rather than row; this will be important for performance; also date for reference

setClass("person", representation(name="character", age="numeric"))
pers <- new("person", name=c("John", "Tom"), age=c(20, 30))
date <- as.POSIXct(c('2013-01-01', '2013-01-02'))

Some experimenting, including looking at methods(class="POSIXct") and paying attention to error messages led me to implement as.data.frame.person and format.person (the latter is used for display in a data.frame) as

as.data.frame.person <-
    function(x, row.names=NULL, optional=FALSE, ...)
{
    if (is.null(row.names))
        row.names <- x@name
    value <- list(x)
    attr(value, "row.names") <- row.names
    class(value) <- "data.frame"
    value
}

format.person <- function(x, ...) paste0(x@name, ", ", x@age)

This gets me my objects in a data.frame:

> lst <- list(id=1:2, date=date, pers=pers)
> as.data.frame(lst)
     id       date     pers
John  1 2013-01-01 John, 20
Tom   2 2013-01-02  Tom, 30

If I want to subset, then I need

setMethod("[", "person", function(x, i, j, ..., drop=TRUE) {
    initialize(x, name=x@name[i], age=x@age[i])
})

I'm not sure what other methods might be required as more data.frame operations are encountered, there is no "data.frame interface".

Using the vectorized class in data.table seems to require a length method for construction.

> library(data.table)
> data.table(id=1:2, pers=pers)
Error in data.table(id = 1:2, pers = pers) : 
  problem recycling column 2, try a simpler type
> setMethod(length, "person", function(x) length(x@name))
[1] "length"
> data.table(id=1:2, pers=pers)
   id     pers
1:  1 John, 20
2:  2  Tom, 30

Maybe there's a data.table interface?

like image 80
Martin Morgan Avatar answered Oct 04 '22 03:10

Martin Morgan