I'm trying to create a new column that indicates if an ID was present in a previous group. Here's my data:
data <- data.table(ID = c(1:3, c(9,2,3,4),c(5,1)),
groups = c(rep(c("a", "b", "c"), c(3, 4,2))))
ID groups
1: 1 a
2: 2 a
3: 3 a
4: 9 b
5: 2 b
6: 3 b
7: 4 b
8: 5 c
9: 1 c
I'm not sure how to specify lagged groups. I tried to use shift
, but it's not working:
data[,.(ID=ID,match_lagged=ID %in% shift(ID)),by=groups]
Here's my desired result.
The first 3 lines are not matched because there is no previous group. FALSE would also work for these three rows. ID=4 (in group b) is not matched in group a. ID=5 (in group c) is not matched in group b.
Note that ID 1 in group c is not matched in group b so it should be false even though it exists in group a. This is why duplicated(data$ID)
does not work. Data from a group has to be matched from the previous group.
groups ID match_lagged
1: a 1 NA
2: a 2 NA
3: a 3 NA
4: b 9 FALSE
5: b 2 TRUE
6: b 3 TRUE
7: b 4 FALSE
8: c 5 FALSE
9: c 1 FALSE
A dplyr
solution would also work.
Number the groups, and then check if the diff
is equal to one for each ID
.
data[, grp.id := .GRP, by = groups]
data[, match_lagged := c(FALSE, diff(grp.id) == 1), by = ID][
grp.id == 1, match_lagged := NA][]
# ID groups grp.id match_lagged
#1: 1 a 1 NA
#2: 2 a 1 NA
#3: 3 a 1 NA
#4: 9 b 2 FALSE
#5: 2 b 2 TRUE
#6: 3 b 2 TRUE
#7: 4 b 2 FALSE
#8: 5 c 3 FALSE
#9: 1 c 3 FALSE
This assumes that you find each ID
only once in each group. If that's not the case you can unique, do the above, and then merge in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With