using data.table to flag the first (or last) record in a group

Tags:

Given a sortkey, is there a data.table shortcut to duplicate the first and last functionalities found in SAS and SPSS ?

The pedestrian approach below flags the first record of a group.

Given the elegance of data.table (with which I'm slowly getting familiar), I'm assuming there's a shortcut using a self join & mult, but I'm still trying to figure it out.

Here's the example:

require(data.table)  set.seed(123) n <- 17 DT <- data.table(x=sample(letters[1:3],n,replace=T),                  y=sample(LETTERS[1:3],n,replace=T)) sortkey  <- c("x","y") setkeyv(DT,sortkey) key <- paste(DT$x,DT$y,sep="-") nw <- c( T , key[2:n]!=key[1:(n-1)] ) DT$first <- 1*nw DT

372

asked May 06 '12 21:05

M.Dimo

2 Answers

One easy way is to use the duplicated() function. When applied to a data-frame, it produces a vector where an entry is TRUE if and only if the row value combination has not occurred before, when moving down the data-frame.

DT$first <- !duplicated( DT[, list(x,y) ])                                                                                                                                                                                                                                     DT$last <- rev(!duplicated( DT[, list(rev(x),rev(y)) ]))                                                                                                                                                                                                                        > DT                                                                                                                                                                                                                                                                                 x y first  last                                                                                                                                                                                                                                                           [1,] a A  TRUE  TRUE                                                                                                                                                                                                                                                           [2,] a B  TRUE  TRUE                                                                                                                                                                                                                                                           [3,] a C  TRUE FALSE                                                                                                                                                                                                                                                           [4,] a C FALSE  TRUE                                                                                                                                                                                                                                                           [5,] b A  TRUE  TRUE                                                                                                                                                                                                                                                           [6,] b B  TRUE  TRUE                                                                                                                                                                                                                                                           [7,] b C  TRUE FALSE                                                                                                                                                                                                                                                           [8,] b C FALSE FALSE                                                                                                                                                                                                                                                           [9,] b C FALSE FALSE                                                                                                                                                                                                                                                          [10,] b C FALSE  TRUE                                                                                                                                                                                                                                                          [11,] c A  TRUE FALSE                                                                                                                                                                                                                                                          [12,] c A FALSE  TRUE                                                                                                                                                                                                                                                          [13,] c B  TRUE FALSE                                                                                                                                                                                                                                                          [14,] c B FALSE  TRUE                                                                                                                                                                                                                                                          [15,] c C  TRUE FALSE                                                                                                                                                                                                                                                          [16,] c C FALSE FALSE                                                                                                                                                                                                                                                          [17,] c C FALSE  TRUE

Another way without using duplicated() is:

DT[ unique(DT), list(first = c(1, rep(0,length(y)-1)),                                                                                                                                                                                                                                              last =  c(rep(0,length(y)-1),1 )) ]              x y  first last                                                                                                                                                                                                                                                      [1,] a A     1    1                                                                                                                                                                                                                                                            [2,] a B     1    1                                                                                                                                                                                                                                                            [3,] a C     1    0                                                                                                                                                                                                                                                            [4,] a C     0    1                                                                                                                                                                                                                                                            [5,] b A     1    1                                                                                                                                                                                                                                                            [6,] b B     1    1                                                                                                                                                                                                                                                            [7,] b C     1    0                                                                                                                                                                                                                                                            [8,] b C     0    0                                                                                                                                                                                                                                                            [9,] b C     0    0                                                                                                                                                                                                                                                           [10,] b C     0    1                                                                                                                                                                                                                                                           [11,] c A     1    0                                                                                                                                                                                                                                                           [12,] c A     0    1                                                                                                                                                                                                                                                           [13,] c B     1    0                                                                                                                                                                                                                                                           [14,] c B     0    1                                                                                                                                                                                                                                                           [15,] c C     1    0                                                                                                                                                                                                                                                           [16,] c C     0    0                                                                                                                                                                                                                                                           [17,] c C     0    1

answered Oct 23 '22 04:10

Prasad Chalasani

Here are couple of solutions using data.table:

## Option 1 (cleaner solution, added 2016-11-29) uDT <- unique(DT) DT[, c("first","last"):=0L] DT[uDT, first:=1L, mult="first"] DT[uDT, last:=1L, mult="last"]   ## Option 2 (original answer, retained for posterity) DT <- cbind(DT, first=0L, last=0L) DT[DT[unique(DT),,mult="first", which=TRUE], first:=1L] DT[DT[unique(DT),,mult="last", which=TRUE], last:=1L]  head(DT) #      x y first last # [1,] a A     1    1 # [2,] a B     1    1 # [3,] a C     1    0 # [4,] a C     0    1 # [5,] b A     1    1 # [6,] b B     1    1

There's obviously a lot packed into each of those lines. The key construct, though, is the following, which returns the row index of the first record in each group:

DT[unique(DT),,mult="first", which=TRUE] # [1]  1  2  3  5  6  7 11 13 15

answered Oct 23 '22 04:10

Josh O'Brien

Related questions
                            
                                Disabled splitter in qt designer
                            
                                Is it possible to determine if a type is a scoped enumeration type?
                            
                                How to pass strings from C# to C++ (and from C++ to C#) using DLLImport?
                            
                                Ruby `split': invalid byte sequence in UTF-8 (ArgumentError)
                            
                                Does Lua have OR comparisons?
                            
                                OnClickListener not working for first item in GridView
                            
                                How do you use scalaz.WriterT for logging in a for expression?
                            
                                SQL - What is the performance impact of having multiple CASE statements in SELECT - Teradata
                            
                                how to highlight/select text in a wpf textbox without focus?
                            
                                Meteor publish/subscribe strategies for unique client-side collections
                            
                                Twitter Bootstrap FileUpload
                            
                                Quicksort: Iterative or Recursive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With