I have a dataframe with different observations over time. As soon as an ID has a positive value for "Match", the rows with the ID in the dates that follow has to be removed. This is an example dataframe:
Date ID Match
2018-06-06 5 1
2018-06-06 6 0
2018-06-07 5 1
2018-06-07 6 0
2018-06-07 7 1
2018-06-08 5 0
2018-06-08 6 1
2018-06-08 7 1
2018-06-08 8 1
Desired output:
Date ID Match
2018-06-06 5 1
2018-06-06 6 0
2018-06-07 6 0
2018-06-07 7 1
2018-06-08 6 1
2018-06-08 8 1
In other words, because ID=5 has a positive match on 2018-06-06, the rows with ID=5 are removed for the following days BUT the row with the first positive match for this ID is kept.
Reproducable example:
Date <- c("2018-06-06","2018-06-06","2018-06-07","2018-06-07","2018-06-07","2018-06-08","2018-06-08","2018-06-08","2018-06-08")
ID <- c(5,6,5,6,7,5,6,7,8)
Match <- c(1,0,1,0,1,0,1,1,1)
df <- data.frame(Date,ID,Match)
Thank you in advance
One way:
library(data.table)
setDT(df)
df[, Match := as.integer(as.character(Match))] # fix bad format
df[, .SD[shift(cumsum(Match), fill=0) == 0], by=ID]
ID Date Match
1: 5 2018-06-06 1
2: 6 2018-06-06 0
3: 6 2018-06-07 0
4: 6 2018-06-08 1
5: 7 2018-06-07 1
6: 8 2018-06-08 1
We want to drop rows after the first Match == 1.
cumsum
takes the cumulative sum of Match. It is zero until the first Match == 1. We want to keep the latter row and so check cumsum
on the preceding row with shift
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With