Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate within a data.table

I want to compute statistics that are grouped and cumulative such as the simulations below where I have 10 observations per day for 5 days and I compute the cumulative standard deviation for each day.

library(data.table)
library(tictoc)

DURATION <- 5
DAILY_N <- 10
N_PER_COND <- DURATION * DAILY_N

dt <- 
    data.table(
      day = rep(1:DURATION, each = DAILY_N),
      x = rgamma(n=N_PER_COND, shape=5, scale=25)
    )

cum_stdevs <- vector('double', DURATION)

tic()
for (i in seq_along(cum_stdevs)) {
    cum_x <- dt[day <= i, x]
    cum_stdevs[i] <- sd(cum_x)
}
toc()

Is there a way to perform this kind of operation within data.table without resorting to a for loop?

Even within the for loop, the speed improvement was 14x over using standard dataframes.

like image 735
Joe Avatar asked Jun 08 '26 20:06

Joe


1 Answers

I guess you can try sapply within data.table like below

cum_stdevs <- dt[, sapply(seq_along(cum_stdevs), function(k) sd(x[day <= k]))]
like image 125
ThomasIsCoding Avatar answered Jun 10 '26 09:06

ThomasIsCoding



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!