In the R-Package data.table the manual entry for ?data.table-class
says that 'data.table' can be used for inheritance in a class definition, i.e. in the contains argument in a call to setClass
:
library("data.table")
setClass("Data.Table", contains = "data.table")
However, if I create an instance of a Data.Table I would have expected that I can treat it like a data.table. This is not so. The following snippet will result in an error, which, as far as I understand, is because the [.data.table
function can not handle the mix of S3 and S4 dispatch:
dat <- new("Data.Table", data.table(x = 1))
dat[TRUE]
I solved this, by defining a new method for [
and coercing any Data.Table to a data.table before evaluating it therein.
setMethod(
"[",
"Data.Table",
function(x, i, j, ..., drop = TRUE) {
mc <- match.call()
mc$x <- substitute(S3Part(x, strictS3 = TRUE))
Data.Table(
eval(mc, envir = parent.frame())
)
})
And a constructor function to feel more comfortable with it:
Data.Table <- function(...) new("Data.Table", data.table(...))
dat <- Data.Table(x = 1, key = "x")
dat[1]
This is acceptable for some scenarios but I loose all get and set functions from the data.table package and I suspect that I destroyed some other features. So the question is how to implement a working S4 data.table class? I would appreciate
There is one related question on SO I found, which presents a similar approach. However, I think it would involve too much coding to be feasible.
I think the short answer (the problem is still as valid as it was when raised) is that using data.table as a super class in S4 is not recommendable and not possible without considerable amount of effort and certain risks of instability.
It is also not quite clear what the goal should have been with the case at hand, but let's assume there was no alternative like forking and modifying the existing data.table
package.
Then, to illustrate the case mentioned above with the [
, let's first initialize the example:
# replicating some code from above
library("data.table")
Data.Table <- setClass("Data.Table", contains = "data.table")
dat <- Data.Table(data.table(x = 1))
dat[1]
> Error in if (n > 0) c(NA_integer_, -n) else integer() :
argument is of length zero
dat2 <- data.table(x = 1)
Now to check [.data.table
, which is a lot of code as you can see on the Github repo data.table.R, so just reproducing the relevant part in the simplest dummy way:
# initializing output
ans = vector("list", 1)
# data (just one line of code as we have just one value in our example).
# desired subscript is row 1, but we have just one column as well.
ans[[1]] <- dat[[1]][1]
# add 'names' attribute
setattr(ans, "names", "x")
# set 'class' attribute
setattr(ans, "class", class(dat))
# set 'row.names'
setattr(ans, "row.names", .set_row_names(nrow(ans)))
And there we have the error, trying to set the row.names
, which doesn't work because dim(ans)
and therefore nrow
is NULL
.
setattr(ans, "class", class(dat))
, which doesn't work well (try isS4(ans)
or print(ans)
just afterwards). In fact, from ?class
we can read about S4:The replacement version of the function sets the class to the value provided. For classes that have a formal definition, directly replacing the class this way is strongly deprecated. The expression as(object, value) is the way to coerce an object to a particular class.
data.table's
setattr
, which through C
uses R's
setAttrib
function, is similar to calling attr(ans, "class") <- "Data.Table"
or class(ans) <- "Data.Table"
, which would screw up as well.
If you do setattr(ans, "class", class(dat2))
instead, you will see that everything is fine here, as should be with S3
.
One more word of caution though:
setattr(ans, "class", "data.frame")
and then print(ans)
or dim(ans)
may not look very nice to you... (although ans$x
is ok).
Overriding setattr()
in a good way isn't trivial either and such an approach will probably not get you any farther than the approach you have outlined above. Result could be something like:
setattr_new <- function(x, name, value) {
if (name == "class" && "Data.Table" %in% value) {
value <- c("data.table", "data.frame")
}
if (name == "names" && is.data.table(x) && length(attr(x, "names")) && !is.null(value))
setnames(x, value)
else {
ans = .Call(Csetattrib, x, name, value)
if (!is.null(ans)) {
warning("Input is a length=1 logical that points to the same address as R's global TRUE value. Therefore the attribute has not been set by reference, rather on a copy. You will need to assign the result back to a variable. See https://github.com/Rdatatable/data.table/issues/1281 for more.")
x = ans
}
}
if (name == "levels" && is.factor(x) && anyDuplicated(value))
.Call(Csetlevels, x, (value <- as.character(value)), unique(value))
invisible(x)
}
godmode:::assignAnywhere("setattr", setattr_new)
identical(dat[1], dat2[1])
[1] TRUE
# then possibly convert back to S4 class if desired for further processing at the end
as(dat[1], "Data.Table")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With