Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does data.table lose class definition in .SD after group by?

I've tried to attach my own class to a numeric, to alter the output format. This works fine, but after I do a group by the class reverts back to numeric.

Example: Define a new format function for my class:

format.myclass <- function(x, ...){
  paste("!!", x, "!!", sep = "")
}

Then make a small data.table and change one of the columns to myclass:

> DT <- data.table(L = rep(letters[1:3],3), N = 1:9)
> setattr(DT$N, "class", "myclass")
> DT
   L     N
1: a !!1!!
2: b !!2!!
3: c !!3!!
4: a !!4!!
5: b !!5!!
6: c !!6!!
7: a !!7!!
8: b !!8!!
9: c !!9!!

Now perform a group by and the N column reverts to integer:

> DT[, .SD, by = L]
   L N
1: a 1
2: a 4
3: a 7
4: b 2
5: b 5
6: b 8
7: c 3
8: c 6
9: c 9

> DT[, sapply(.SD, class), by = L]
   L      V1
1: a integer
2: b integer
3: c integer

Any idea why?

like image 222
Corvus Avatar asked Feb 07 '13 14:02

Corvus


1 Answers

Because whenever R subsets a vector it just throws away the class. Why? Well, because its an aRse, that's why. You need to write a "["-subset method.

> DT[,N]
[1] 1 2 3 4 5 6 7 8 9
attr(,"class")
[1] "myclass"
> DT[1:2,N]
[1] 1 2

see how subsetting the vector has removed the class? That's the problem. data.table is doing this at some point to your vector. Write a "[" method (just copy the one that Date uses):

"[.myclass"= function (x, ..., drop = TRUE){
    cl <- oldClass(x)
    class(x) <- NULL
    val <- NextMethod("[")
    class(val) <- cl
    val
}

> DT[1:2,N]
[1] 1 2
attr(,"class")
[1] "myclass"

and now it has some class. This also fixes your last line with the sapply:

> DT[, sapply(.SD, class), by = L]
   L      V1
1: a myclass
2: b myclass
like image 70
Spacedman Avatar answered Sep 29 '22 11:09

Spacedman