I have a data table: <pre class="prettyprint"><code>> (mydt <- data.table(id=c(1,1,1,1,2,2), time=1:6, v1=letters[1:6], v2=LETTERS[1:6], key=c("id","time"))) id time v1 v2 1: 1 1 a A 2: 1 2 b B 3: 1 3 c C 4: 1 4 d D 5: 2 5 e E 6: 2 6 f F </code></pre> I want to "roll it up" (is that the right term here?), to a "change" table: object <code>1</code> changed 3 times (from timestamp 1 to 2, 2 to 3, and 3 to 4) object <code>2</code> changed once (time 5 to 6); I am interested in the initial <code>v1</code> and final <code>v2</code>. So, the result should be: <pre class="prettyprint"><code>> (res <- data.table(beg.time=c(1,2,3,5), end.time=c(2,3,4,6), v1=c('a','b','c','e'), v2=c('B','C','D','F'), key=c("beg.time","end.time"))) beg.time end.time v1 v2 1: 1 2 a B 2: 2 3 b C 3: 3 4 c D 4: 5 6 e F </code></pre>

Thanks for the reproducible example! Here's a shot at it. First, note that you can use the following head-tail idiom to put entries of a vector that are a set distance apart next to each other: <pre class="prettyprint"><code>x <- letters[1:5] cbind(head(x, -1), tail(x, -1)) # [,1] [,2] # [1,] "a" "b" # [2,] "b" "c" # [3,] "c" "d" # [4,] "d" "e" cbind(head(x, -2), tail(x, -2)) # [,1] [,2] # [1,] "a" "c" # [2,] "b" "d" # [3,] "c" "e" </code></pre> Then, we can use the <code>by</code> functionality of <code>data.table</code> to do this operation by group. <pre class="prettyprint"><code>mydt[,{ ## if there's just one row in the group of ID's, return nothing if (.N == 1) return(NULL) else { list( ## head and tail take the first and last parts of a vector ## this will place an element next to its subsequent element beg.time = head(time, -1), end.time = tail(time, -1), v1 = head(v1, -1), v2 = tail(v2, -1) ## group by ID )}}, by = id] # id beg.time end.time v1 v2 # 1: 1 1 2 a B # 2: 1 2 3 b C # 3: 1 3 4 c D # 4: 2 5 6 e F </code></pre>

Roll up a data.table

Tags:

r

data.table

I have a data table:

> (mydt <- data.table(id=c(1,1,1,1,2,2),
                      time=1:6,
                      v1=letters[1:6],
                      v2=LETTERS[1:6],
                      key=c("id","time")))
   id time v1 v2
1:  1    1  a  A
2:  1    2  b  B
3:  1    3  c  C
4:  1    4  d  D
5:  2    5  e  E
6:  2    6  f  F

I want to "roll it up" (is that the right term here?), to a "change" table: object 1 changed 3 times (from timestamp 1 to 2, 2 to 3, and 3 to 4) object 2 changed once (time 5 to 6); I am interested in the initial v1 and final v2. So, the result should be:

> (res <- data.table(beg.time=c(1,2,3,5),
                     end.time=c(2,3,4,6),
                     v1=c('a','b','c','e'),
                     v2=c('B','C','D','F'),
                     key=c("beg.time","end.time")))
   beg.time end.time v1 v2
1:        1        2  a  B
2:        2        3  b  C
3:        3        4  c  D
4:        5        6  e  F

569

asked Sep 17 '13 15:09

sds

1 Answers

Thanks for the reproducible example! Here's a shot at it.

First, note that you can use the following head-tail idiom to put entries of a vector that are a set distance apart next to each other:

x <- letters[1:5]
cbind(head(x, -1), tail(x, -1))
     # [,1] [,2]
# [1,] "a"  "b" 
# [2,] "b"  "c" 
# [3,] "c"  "d" 
# [4,] "d"  "e" 
cbind(head(x, -2), tail(x, -2))
     # [,1] [,2]
# [1,] "a"  "c" 
# [2,] "b"  "d" 
# [3,] "c"  "e"

Then, we can use the by functionality of data.table to do this operation by group.

mydt[,{
    ## if there's just one row in the group of ID's, return nothing
    if (.N == 1) return(NULL) 
    else {
        list(
            ## head and tail take the first and last parts of a vector
            ## this will place an element next to its subsequent element
            beg.time = head(time, -1),
            end.time = tail(time, -1),
            v1 = head(v1, -1),
            v2 = tail(v2, -1)
## group by ID
)}}, by = id]

#    id beg.time end.time v1 v2
# 1:  1        1        2  a  B
# 2:  1        2        3  b  C
# 3:  1        3        4  c  D
# 4:  2        5        6  e  F

103

answered Sep 19 '22 04:09

Blue Magister

Related questions
                            
                                R for solving linear programming problems
                            
                                Testing warnings in R by comparing strings (best idea?)
                            
                                Running a Windows executable file from within R with command line options
                            
                                capturing an expression as a function body in R
                            
                                Test statistic (e.g. chisquare test) inside latex table using the tables-package in R/Knitr/Rstudio
                            
                                How to get column name of the variable with the top 10 highest values?
                            
                                R, GET and GZ compression
                            
                                Show all lines in GenomicRange package output
                            
                                ggplot legend title top center
                            
                                Plotting a third variable against x and y axis
                            
                                R- Partial eta squared for repeated measures ANOVA (car package)
                            
                                POSIXct to numeric using different timezones
                            
                                How can I overlay multiple stat_contour plots on the same graph using ggplot2?
                            
                                Naive Bayes classifier bases decision only on a-priori probabilities
                            
                                Render rCharts in slides from Slidify
                            
                                Building packages with Rcpp, Attributes not handled correctly
                            
                                How to evaluate arguments of a function call inside other function in R
                            
                                Using system with windows
                            
                                How to get a date from day of year
                            
                                Is there any difference between `geom_a(stat="b", ...)` and `stat_b(geom="a",...)`?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With