Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match in lagged group in data.table

I'm trying to create a new column that indicates if an ID was present in a previous group. Here's my data:

data <- data.table(ID = c(1:3, c(9,2,3,4),c(5,1)),
                   groups = c(rep(c("a", "b", "c"), c(3, 4,2))))
   ID groups
1:  1      a
2:  2      a
3:  3      a
4:  9      b
5:  2      b
6:  3      b
7:  4      b
8:  5      c
9:  1      c

I'm not sure how to specify lagged groups. I tried to use shift, but it's not working:

data[,.(ID=ID,match_lagged=ID %in% shift(ID)),by=groups]

Here's my desired result.

The first 3 lines are not matched because there is no previous group. FALSE would also work for these three rows. ID=4 (in group b) is not matched in group a. ID=5 (in group c) is not matched in group b.

Note that ID 1 in group c is not matched in group b so it should be false even though it exists in group a. This is why duplicated(data$ID) does not work. Data from a group has to be matched from the previous group.

groups ID match_lagged
1:      a  1         NA
2:      a  2         NA
3:      a  3         NA
4:      b  9         FALSE
5:      b  2         TRUE
6:      b  3         TRUE
7:      b  4         FALSE
8:      c  5         FALSE
9:      c  1         FALSE

A dplyr solution would also work.

like image 310
Pierre Lapointe Avatar asked Jun 22 '17 21:06

Pierre Lapointe


1 Answers

Number the groups, and then check if the diff is equal to one for each ID.

data[, grp.id := .GRP, by = groups]
data[, match_lagged := c(FALSE, diff(grp.id) == 1), by = ID][
     grp.id == 1, match_lagged := NA][]
#   ID groups grp.id match_lagged
#1:  1      a      1           NA
#2:  2      a      1           NA
#3:  3      a      1           NA
#4:  9      b      2        FALSE
#5:  2      b      2         TRUE
#6:  3      b      2         TRUE
#7:  4      b      2        FALSE
#8:  5      c      3        FALSE
#9:  1      c      3        FALSE

This assumes that you find each ID only once in each group. If that's not the case you can unique, do the above, and then merge in.

like image 144
eddi Avatar answered Sep 22 '22 04:09

eddi