I have a data table:
> (mydt <- data.table(id=c(1,1,1,1,2,2),
time=1:6,
v1=letters[1:6],
v2=LETTERS[1:6],
key=c("id","time")))
id time v1 v2
1: 1 1 a A
2: 1 2 b B
3: 1 3 c C
4: 1 4 d D
5: 2 5 e E
6: 2 6 f F
I want to "roll it up" (is that the right term here?), to a "change" table:
object 1
changed 3 times (from timestamp 1 to 2, 2 to 3, and 3 to 4)
object 2
changed once (time 5 to 6);
I am interested in the initial v1
and final v2
.
So, the result should be:
> (res <- data.table(beg.time=c(1,2,3,5),
end.time=c(2,3,4,6),
v1=c('a','b','c','e'),
v2=c('B','C','D','F'),
key=c("beg.time","end.time")))
beg.time end.time v1 v2
1: 1 2 a B
2: 2 3 b C
3: 3 4 c D
4: 5 6 e F
Rollup tables are tables where totals for (combinations of) conditions are saved. A “summary table”, that holds pre-aggregated values for all conditions you may need to fetch totals for.
Roll-up data is the term given to the data that has been summarized. Before any raw data is purged, the minimum, maximum, and average values are calculated and can be preserved in several ways. Hourly data is the term given to data that is summarized, or rolled-up, every hour.
With rollup tables, you can pre-aggregate your older data for the queries you still need to answer. Then you no longer need to store all of the older data, rather, you can delete the older data or roll it off to slower storage—saving space and computing power.
Thanks for the reproducible example! Here's a shot at it.
First, note that you can use the following head-tail idiom to put entries of a vector that are a set distance apart next to each other:
x <- letters[1:5]
cbind(head(x, -1), tail(x, -1))
# [,1] [,2]
# [1,] "a" "b"
# [2,] "b" "c"
# [3,] "c" "d"
# [4,] "d" "e"
cbind(head(x, -2), tail(x, -2))
# [,1] [,2]
# [1,] "a" "c"
# [2,] "b" "d"
# [3,] "c" "e"
Then, we can use the by
functionality of data.table
to do this operation by group.
mydt[,{
## if there's just one row in the group of ID's, return nothing
if (.N == 1) return(NULL)
else {
list(
## head and tail take the first and last parts of a vector
## this will place an element next to its subsequent element
beg.time = head(time, -1),
end.time = tail(time, -1),
v1 = head(v1, -1),
v2 = tail(v2, -1)
## group by ID
)}}, by = id]
# id beg.time end.time v1 v2
# 1: 1 1 2 a B
# 2: 1 2 3 b C
# 3: 1 3 4 c D
# 4: 2 5 6 e F
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With