Match in lagged group in data.table

Question

I'm trying to create a new column that indicates if an ID was present in a previous group. Here's my data:

data <- data.table(ID = c(1:3, c(9,2,3,4),c(5,1)),
                   groups = c(rep(c("a", "b", "c"), c(3, 4,2))))
   ID groups
1:  1      a
2:  2      a
3:  3      a
4:  9      b
5:  2      b
6:  3      b
7:  4      b
8:  5      c
9:  1      c

I'm not sure how to specify lagged groups. I tried to use shift, but it's not working:

data[,.(ID=ID,match_lagged=ID %in% shift(ID)),by=groups]

Here's my desired result.

The first 3 lines are not matched because there is no previous group. FALSE would also work for these three rows. ID=4 (in group b) is not matched in group a. ID=5 (in group c) is not matched in group b.

Note that ID 1 in group c is not matched in group b so it should be false even though it exists in group a. This is why duplicated(data$ID) does not work. Data from a group has to be matched from the previous group.

groups ID match_lagged
1:      a  1         NA
2:      a  2         NA
3:      a  3         NA
4:      b  9         FALSE
5:      b  2         TRUE
6:      b  3         TRUE
7:      b  4         FALSE
8:      c  5         FALSE
9:      c  1         FALSE

A dplyr solution would also work.

eddi · Accepted Answer

Number the groups, and then check if the diff is equal to one for each ID.

data[, grp.id := .GRP, by = groups]
data[, match_lagged := c(FALSE, diff(grp.id) == 1), by = ID][
     grp.id == 1, match_lagged := NA][]
#   ID groups grp.id match_lagged
#1:  1      a      1           NA
#2:  2      a      1           NA
#3:  3      a      1           NA
#4:  9      b      2        FALSE
#5:  2      b      2         TRUE
#6:  3      b      2         TRUE
#7:  4      b      2        FALSE
#8:  5      c      3        FALSE
#9:  1      c      3        FALSE

This assumes that you find each ID only once in each group. If that's not the case you can unique, do the above, and then merge in.

Match in lagged group in data.table

Tags:

matching

r

match

data.table

dplyr

Pierre Lapointe

1 Answers

eddi

Recent Activity

Donate For Us

Match in lagged group in data.table

Tags:

matching

r

match

data.table

dplyr

Pierre Lapointe

1 Answers

eddi

Related questions

Recent Activity

Donate For Us