Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table aggregation to list column

Tags:

r

data.table

I'm trying to aggregate a data from a data.table to create a new column which is a list of previous rows. It's easier to see by example:

dt <- data.table(id = c(1,1,1,1,2,2,3,3,3), letter = c('a','a','b','c','a','c','b','b','a'))

I would like to aggregate this in such a ways that the result should be

   id  letter
1:  1 a,a,b,c
2:  2     a,c
3:  3   b,b,a  

Intuitively I tried

dt[,j = list(list(letter)), by = id]

but that doesn't work. Oddly enough when I go case by case, for example:

> dt[id == 1,j = list(list(letter)), by = id]

   id      V1
1:  1 a,a,b,c

the result is fine... I feel like I'm missing an .SD somewhere or something like that...

Can anybody point me in the right direction?

Thanks!

like image 452
MagicScout Avatar asked Mar 19 '23 19:03

MagicScout


1 Answers

Update: The behaviour DT[, list(list(.)), by=.] sometimes resulted in wrong results in R version >= 3.1.0. This is now fixed in commit #1280 in the current development version of data.table v1.9.3. From NEWS:

  • DT[, list(list(.)), by=.] returns correct results in R >=3.1.0 as well. The bug was due to recent (welcoming) changes in R v3.1.0 where list(.) does not result in a copy. Closes #481.

With this update, it's not necessary for I() anymore. You can just do: DT[, list(list(.)), by=.] as before.


This seems to be a similar issue as the known bug #5585. In your case, I think you could just use

dt[, paste(letter, collapse=","), by = id] 

to fix your problem.

As @ilir pointed out, if it is actually desirable to get a list (rather than the displayed character), you could use the workaround suggested in the bug report:

dt[, list(list(I(letter))), by = id]
like image 50
shadow Avatar answered Mar 31 '23 16:03

shadow