I'm trying to aggregate a data from a data.table to create a new column which is a list of previous rows. It's easier to see by example:
dt <- data.table(id = c(1,1,1,1,2,2,3,3,3), letter = c('a','a','b','c','a','c','b','b','a'))
I would like to aggregate this in such a ways that the result should be
id letter
1: 1 a,a,b,c
2: 2 a,c
3: 3 b,b,a
Intuitively I tried
dt[,j = list(list(letter)), by = id]
but that doesn't work. Oddly enough when I go case by case, for example:
> dt[id == 1,j = list(list(letter)), by = id]
id V1
1: 1 a,a,b,c
the result is fine... I feel like I'm missing an .SD
somewhere or something like that...
Can anybody point me in the right direction?
Thanks!
Update: The behaviour DT[, list(list(.)), by=.]
sometimes resulted in wrong results in R version >= 3.1.0. This is now fixed in commit #1280 in the current development version of data.table v1.9.3. From NEWS:
DT[, list(list(.)), by=.]
returns correct results in R >=3.1.0 as well. The bug was due to recent (welcoming) changes in R v3.1.0 wherelist(.)
does not result in a copy. Closes #481.
With this update, it's not necessary for I()
anymore. You can just do: DT[, list(list(.)), by=.]
as before.
This seems to be a similar issue as the known bug #5585. In your case, I think you could just use
dt[, paste(letter, collapse=","), by = id]
to fix your problem.
As @ilir pointed out, if it is actually desirable to get a list (rather than the displayed character), you could use the workaround suggested in the bug report:
dt[, list(list(I(letter))), by = id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With