Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`j` doesn't evaluate to the same number of columns for each group

Tags:

r

data.table

I am trying to use data.table where my j function could and will return a different number of columns on each call. I would like it to behave like rbind.fill in that it fills any missing columns with NA.

fetch <- function(by) {
    if(by == 1)
        data.table(A=c("a"), B=c("b"))
    else
        data.table(B=c("b"))
}
data <- data.table(id=c(1,2))
result <- data[, fetch(.BY), by=id]

In this case 'result' may end up with two columns; A and B. 'A' and 'B' was returned as part of the first call to 'fetch' and only 'B' was returned as part of the second. I would like the example code to return this result.

  id    A B
1  1    a b
2  2 <NA> b

Unfortunately, when run I get this error.

Error in `[.data.table`(data, , fetch(.BY, .SD), by = id) : 
j doesn't evaluate to the same number of columns for each group

I can do this with plyr as follows, but in my real world use case plyr is running out of memory. Each call to fetch occurs rather quickly, but the memory crash occurs when plyr tries to merge all of the data back together. I am trying to see if data.table might solve this problem for me.

result <- ddply(data, "id", fetch)

Any thoughts appreciated.

like image 795
Nick Allen Avatar asked Sep 26 '13 16:09

Nick Allen


1 Answers

DWin's approach is good. Or you could return a list column instead, where each cell is itself a vector. That's generally a better way of handling variable length vectors.

DT = data.table(A=rep(1:3,1:3),B=1:6)
DT
   A B
1: 1 1
2: 2 2
3: 2 3
4: 3 4
5: 3 5
6: 3 6
ans = DT[, list(list(B)), by=A]
ans
   A    V1
1: 1     1
2: 2   2,3     # V1 is a list column. These aren't strings, the
3: 3 4,5,6     # vectors just display with commas

ans$V1[3]
[[1]]
[1] 4 5 6

ans$V1[[3]]
[1] 4 5 6

ans[,sapply(V1,length)]
[1] 1 2 3

So in your example you could use this as follows:

library(plyr)

rbind.fill(data[, list(list(fetch(.BY))), by = id]$V1)
#     A B
#1    a b
#2 <NA> b

Or, just make the list returned conformant :

allcols = c("A","B")
fetch <- function(by) {
    if(by == 1)
        list(A=c("a"), B=c("b"))[allcols]
    else
        list(B=c("b"))[allcols]
}
like image 95
Matt Dowle Avatar answered Nov 09 '22 21:11

Matt Dowle