I want to create a data.frame of different variables, including S4 classes. For a built-in class like "POSIXlt" (for dates) this works fine:
as.data.frame(list(id=c(1,2),
date=c(as.POSIXlt('2013-01-01'),as.POSIXlt('2013-01-02'))
But now i have a user defined class, let's say a "Person" class with name and age:
setClass("person", representation(name="character", age="numeric"))
But the following fails:
as.data.frame(list(id=c(1,2), pers=c(new("person", name="John", age=20),
new("person", name="Tom", age=30))))
I also tried to overload the [...]-Operator for the person class using
setMethod(
f = "[",
signature="person",
definition=function(x,i,j,...,drop=TRUE){
initialize(x, name=x@name[i], age = x@age[i])
}
)
This allows for vector-like behavior:
persons = new("person", name=c("John","Tom"), age=c(20,30))
p1 = persons[1]
But still the following fails:
as.data.frame(list(id=c(1,2), pers=persons))
Perhaps I have to overload more operators to get the user defined class into a dataframe? I am sure, there must be a way to do this, as POSIXlt is an S4 class and it works! Any solution using the new R5 reference classes would be also fine!
I do not want to put all my data into the person class (You could ask, why "id" is not a member of person I just do not use dataframes)! The idea is that my data.frame represents a table from a database with many columns with different types, e.g., strings, numbers,... but also dates, intervals, geo-objects, etc... While for dates I already have a solution (POSIXlt), for intervals, geo-objects, etc. I probably need to specify my own S4/R5 classes.
Thanks a lot in advance.
Judging by this thread on the mailing list:
http://tolstoy.newcastle.edu.au/R/e2/devel/06/11/1013.html
...John Chambers was thinking about this in 2006. And still we can't put S4 objects in columns of data frames. We also can't put complex S3 classes in columns of data frames neither.
There are some other tabular data structures that might do it - data.table perhaps:
require(data.table)
setClass("geezer", representation(name="character", age="numeric"))
tom=new("geezer",name="Tom",age=20)
dick=new("geezer",name="Dick",age=23)
harry=new("geezer",name="Harry",age=25)
gt = data.table(geezers=c(tom,dick,harry),weapons=c("Gun","Gun","Knife"))
gt
geezers weapons
1: <geezer> Gun
2: <geezer> Gun
3: <geezer> Knife
The semantics of data.table are a bit different to data.frame, and don't expect to be able to plug a data.table into any code that uses a data.frame and expect it to work (For example, I suspect lm
and glm
will go wobbly). But it seems the data.table authors allow compound classes in columns...
Here's your class, with a "column" interpretation of its definition, rather than row; this will be important for performance; also date for reference
setClass("person", representation(name="character", age="numeric"))
pers <- new("person", name=c("John", "Tom"), age=c(20, 30))
date <- as.POSIXct(c('2013-01-01', '2013-01-02'))
Some experimenting, including looking at methods(class="POSIXct")
and paying attention to error messages led me to implement as.data.frame.person
and format.person
(the latter is used for display in a data.frame) as
as.data.frame.person <-
function(x, row.names=NULL, optional=FALSE, ...)
{
if (is.null(row.names))
row.names <- x@name
value <- list(x)
attr(value, "row.names") <- row.names
class(value) <- "data.frame"
value
}
format.person <- function(x, ...) paste0(x@name, ", ", x@age)
This gets me my objects in a data.frame:
> lst <- list(id=1:2, date=date, pers=pers)
> as.data.frame(lst)
id date pers
John 1 2013-01-01 John, 20
Tom 2 2013-01-02 Tom, 30
If I want to subset, then I need
setMethod("[", "person", function(x, i, j, ..., drop=TRUE) {
initialize(x, name=x@name[i], age=x@age[i])
})
I'm not sure what other methods might be required as more data.frame
operations are encountered, there is no "data.frame interface".
Using the vectorized class in data.table seems to require a length method for construction.
> library(data.table)
> data.table(id=1:2, pers=pers)
Error in data.table(id = 1:2, pers = pers) :
problem recycling column 2, try a simpler type
> setMethod(length, "person", function(x) length(x@name))
[1] "length"
> data.table(id=1:2, pers=pers)
id pers
1: 1 John, 20
2: 2 Tom, 30
Maybe there's a data.table interface?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With