Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Roll up a data.table

Tags:

r

data.table

I have a data table:

> (mydt <- data.table(id=c(1,1,1,1,2,2),
                      time=1:6,
                      v1=letters[1:6],
                      v2=LETTERS[1:6],
                      key=c("id","time")))
   id time v1 v2
1:  1    1  a  A
2:  1    2  b  B
3:  1    3  c  C
4:  1    4  d  D
5:  2    5  e  E
6:  2    6  f  F

I want to "roll it up" (is that the right term here?), to a "change" table: object 1 changed 3 times (from timestamp 1 to 2, 2 to 3, and 3 to 4) object 2 changed once (time 5 to 6); I am interested in the initial v1 and final v2. So, the result should be:

> (res <- data.table(beg.time=c(1,2,3,5),
                     end.time=c(2,3,4,6),
                     v1=c('a','b','c','e'),
                     v2=c('B','C','D','F'),
                     key=c("beg.time","end.time")))
   beg.time end.time v1 v2
1:        1        2  a  B
2:        2        3  b  C
3:        3        4  c  D
4:        5        6  e  F
like image 569
sds Avatar asked Sep 17 '13 15:09

sds


People also ask

What is a roll up table?

Rollup tables are tables where totals for (combinations of) conditions are saved. A “summary table”, that holds pre-aggregated values for all conditions you may need to fetch totals for.

What does it mean to roll up data?

Roll-up data is the term given to the data that has been summarized. Before any raw data is purged, the minimum, maximum, and average values are calculated and can be preserved in several ways. Hourly data is the term given to data that is summarized, or rolled-up, every hour.

What is the benefit of creating aggregate or rollup tables?

With rollup tables, you can pre-aggregate your older data for the queries you still need to answer. Then you no longer need to store all of the older data, rather, you can delete the older data or roll it off to slower storage—saving space and computing power.


1 Answers

Thanks for the reproducible example! Here's a shot at it.

First, note that you can use the following head-tail idiom to put entries of a vector that are a set distance apart next to each other:

x <- letters[1:5]
cbind(head(x, -1), tail(x, -1))
     # [,1] [,2]
# [1,] "a"  "b" 
# [2,] "b"  "c" 
# [3,] "c"  "d" 
# [4,] "d"  "e" 
cbind(head(x, -2), tail(x, -2))
     # [,1] [,2]
# [1,] "a"  "c" 
# [2,] "b"  "d" 
# [3,] "c"  "e" 

Then, we can use the by functionality of data.table to do this operation by group.

mydt[,{
    ## if there's just one row in the group of ID's, return nothing
    if (.N == 1) return(NULL) 
    else {
        list(
            ## head and tail take the first and last parts of a vector
            ## this will place an element next to its subsequent element
            beg.time = head(time, -1),
            end.time = tail(time, -1),
            v1 = head(v1, -1),
            v2 = tail(v2, -1)
## group by ID
)}}, by = id]

#    id beg.time end.time v1 v2
# 1:  1        1        2  a  B
# 2:  1        2        3  b  C
# 3:  1        3        4  c  D
# 4:  2        5        6  e  F
like image 103
Blue Magister Avatar answered Sep 19 '22 04:09

Blue Magister