Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a rolling window using data.table

Tags:

r

data.table

I can implement a rolling window by repeatedly 'shifting' my data, and then summarising 'row-wise', but this seems cumbersome and not easily generalisable to different window sizes.

#' Generate dummy data
library(data.table)
set.seed(42)
d <- data.table(id=rep(letters[1:2], each=5), time=rep(1:5,times=2), x=sample.int(10,10,replace=T))

The data looks like this:

id  time    x
a   1   10
a   2   10
a   3   3
a   4   9
a   5   7
b   1   6
b   2   8
b   3   2
b   4   7
b   5   8

Now take a rolling 'maximum' over the last 2 times (for each id).

#' Now you want to take the maximum of the previous 2 x values (by id)
#' I can do this by creating shifted lagged versions
d[, x.L1 := shift(x,1,type='lag'), by=id]
d[, x.L2 := shift(x,2,type='lag'), by=id]
d[, x.roll.max := max(x,x.L1,x.L2, na.rm=2), by=.(id,time)]

Generates this

id  time    x   x.L1    x.L2    x.roll.max
a   1   10  NA  NA  10
a   2   10  10  NA  10
a   3   3   10  10  10
a   4   9   3   10  10
a   5   7   9   3   9
b   1   6   NA  NA  6
b   2   8   6   NA  8
b   3   2   8   6   8
b   4   7   2   8   8
b   5   8   7   2   8

I am assuming there is a much better way.

like image 669
drstevok Avatar asked Jun 10 '26 07:06

drstevok


1 Answers

So I followed @Franks suggestiong above and went to RcppRoll.

library(Rcpp)
d[, x.roll.max := roll_max(x, n=2L, align='right', fill=NA, na.rm=T), by=id]

And I guess I shouldn't have been trying to do it all in data.table b/c this works very nicely.

  id    time    x   x.roll.max
a   1   11  NA
a   2   12  12
a   3   4   12
a   4   10  10
a   5   8   10
a   6   7   8
b   1   9   NA
b   2   2   9
b   3   8   8
b   4   9   9
b   5   6   9
b   6   9   9
like image 136
drstevok Avatar answered Jun 12 '26 10:06

drstevok