Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table with rollapply

Tags:

r

data.table

Is there an existing idiom for computing rolling statistics using data.table grouping?

For example, given the following code:

DT = data.table(x=rep(c("a","b","c"),each=2), y=c(1,3), v=1:6)
setkey(DT, y)

stat.ror <- DT[,rollapply(v, width=1, by=1, mean, na.rm=TRUE), by=y];

If there isn't one yet, what would be the best way to do it?

like image 954
user2718667 Avatar asked Oct 22 '22 02:10

user2718667


1 Answers

In fact I am trying to solve this very problem right now. Here is a partial solution which will work for grouping by a single column:

Edit: got it with RcppRoll, I think:

windowed.average <- function(input.table,
                             window.width = 2,
                             id.cols = names(input.table)[3],
                             index.col = names(input.table)[1],
                             val.col = names(input.table)[2]) {
  require(RcppRoll)

  avg.with.group <- 
    input.table[,roll_mean(get(val.col), n = window.width),by=c(id.cols)]
  avg.index <- 
    input.table[,roll_mean(get(index.col), n = window.width),by=c(id.cols)]$V1

  output.table <- data.table(
    Group = avg.with.group,
    Index = avg.index)

  # rename columns to (sensibly) match inputs
  setnames(output.table, old=colnames(output.table),
           new = c(id.cols,val.col,index.col))

  return(output.table)
}

A (badly written) unit test that will pass the above:

require(testthat)
require(zoo)
test.datatable <- data.table(Time = rep(seq_len(10), times=2), 
                             Voltage = runif(20), 
                             Channel= rep(seq_len(2),each=10))
test.width <- 8

# first test: single id column
test.avgtable <- data.table(
  test.datatable[,rollapply(Voltage, width = test.width, mean, na.rm=TRUE),
                       by=c("Channel")],
  Time = test.datatable[,rollapply(Time, width = test.width, mean, na.rm=TRUE),
                         by=c("Channel")]$V1)
setnames(test.avgtable,old=names(test.avgtable),
         new=c("Channel","Voltage","Time"))

expect_that(test.avgtable,
            is_identical_to(windowed.average(test.datatable,test.width)))

How it looks:

> test.datatable
    Time     Voltage Channel Class
 1:    1 0.310935570       1     1
 2:    2 0.565257533       1     2
 3:    3 0.577278573       1     1
 4:    4 0.152315111       1     2
 5:    5 0.836052122       1     1
 6:    6 0.655417230       1     2
 7:    7 0.034859642       1     1
 8:    8 0.572040136       1     2
 9:    9 0.268105436       1     1
10:   10 0.126484340       1     2
11:    1 0.139711248       2     1
12:    2 0.336316520       2     2
13:    3 0.413086486       2     1
14:    4 0.304146029       2     2
15:    5 0.399344631       2     1
16:    6 0.581641210       2     2
17:    7 0.183586025       2     1
18:    8 0.009775488       2     2
19:    9 0.449576242       2     1
20:   10 0.938517952       2     2

> test.avgtable
   Channel   Voltage Time
1:       1 0.4630195  4.5
2:       1 0.4576657  5.5
3:       1 0.4028191  6.5
4:       2 0.2959510  4.5
5:       2 0.3346841  5.5
6:       2 0.4099593  6.5

Unfortunately, I haven't managed to make it work with multiple groupings (as this second section shows):

Looks okay for multiple column groups:

# second test: multiple id columns
# Depends on the first test passing to be meaningful.
test.width <- 4
test.datatable[,Class:= rep(seq_len(2),times=ceiling(nrow(test.datatable)/2))]
# windowed.average(test.datatable,test.width,id.cols=c("Channel","Class"))
test.avgtable <- rbind(windowed.average(test.datatable[Class==1,],test.width),
                       windowed.average(test.datatable[Class==2,],test.width))
# somewhat artificially attaching expected class labels
test.avgtable[,Class:= rep(seq_len(2),times=nrow(test.avgtable)/4,each=2)]
setkey(test.avgtable,Channel)
setcolorder(test.avgtable,c("Channel","Class","Voltage","Time"))

expect_that(test.avgtable,
            is_equivalent_to(windowed.average(test.datatable,test.width,
                                             id.cols=c("Channel","Class"))))
like image 118
bright-star Avatar answered Oct 24 '22 12:10

bright-star