Is it possible in data.table
to perform recursive assignment of multiple columns? By recursive I mean that the next assignment depends on the previous assignment:
library(data.table)
DT = data.table(id=rep(LETTERS[1:4], each=2), val=1:8)
DT[, c("cumsum", "cumsumofcumsum"):=list(cumsum(val), cumsum(cumsum)), by=id]
# Error in `[.data.table`(DT, , `:=`(c("cumsum", "cumsumofcumsum"), list(cumsum(val), :
# cannot coerce type 'builtin' to vector of type 'double'
Of course, one can do the assignments individually, but I guess the overhead cost (e.g. grouping) wouldn't be shared among the operations:
DT = data.table(id=rep(LETTERS[1:4], each=2), val=1:8)
DT[, c("cumsum"):=cumsum(val), by=id]
DT[, c("cumsumofcumsum"):=cumsum(cumsum), by=id]
DT
# id val cumsum cumsumofcumsum
# 1: A 1 1 1
# 2: A 2 3 4
# 3: B 3 3 3
# 4: B 4 7 10
# 5: C 5 5 5
# 6: C 6 11 16
# 7: D 7 7 7
# 8: D 8 15 22
You can use a temporary variable and use it again for others variables:
DT[, c("cumsum", "cumsumofcumsum"):={
x <- cumsum(val)
list(x, cumsum(x))
}, by=id]
Of course you can use dplyr
and use your data.table as a backend, but I am not sure that you will get the same performance as the pure data.table method:
library(dplyr)
DT %>%
group_by(id ) %>%
mutate(
cum1 = cumsum(val),
cum2 = cumsum(cum1)
)
Pure data.table solution is 5 times faster than dplyr one. I guess the sort in dplyr behind the scene can explain this difference.
f_dt <-
function(){
DT[, c("cumsum", "cumsumofcumsum"):={
x <- as.numeric(cumsum(val))
list(x, cumsum(x))
}, by=id]
}
f_dplyr <-
function(){
DT %>%
group_by(id ) %>%
mutate(
cum1 = as.numeric(cumsum(val)),
cum2 = cumsum(cum1)
)
}
library(microbenchmark)
microbenchmark(f_dt(),f_dplyr(),times = 100)
expr min lq median uq max neval
f_dt() 2.580121 2.97114 3.256156 4.318658 13.49149 100
f_dplyr() 10.792662 14.09490 15.909856 19.593819 159.80626 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With