I have a data set like this
a <- data.frame(var1 = c("patientA", "patientA", "patientA", "patientB", "patientB", "patientB", "patientB"),
var2 = as.Date(c("2015-01-02","2015-01-04","2015-02-02","2015-02-06","2015-01-02","2015-01-07","2015-04-02")),
var3 = c(F, T, F, F, F, T, F)
)
sequ <- rle(as.character(a$var1))
a$sequ <- sequence(sequ$lengths)
producing
> a
var1 var2 var3 sequ
1 patientA 2015-01-02 FALSE 1
2 patientA 2015-01-04 TRUE 2
3 patientA 2015-02-02 FALSE 3
4 patientB 2015-02-06 FALSE 1
5 patientB 2015-01-02 FALSE 2
6 patientB 2015-01-07 TRUE 3
7 patientB 2015-04-02 FALSE 4
How can I subset/filter this data set so that I get all the rows which var3 == TRUE and var2 date value is greater than in the row where var3 == TRUE (by patient, var1? I tried
subset(a, (var3 == TRUE) & (var2 > var3))
but this does not produce a correct result set. The correct one is
# var1 var2 var3 sequ
# 1 patientA 2015-01-04 TRUE 2
# 2 patientA 2015-02-02 FALSE 3
# 3 patientB 2015-02-06 FALSE 1
# 4 patientB 2015-01-07 TRUE 3
# 5 patientB 2015-04-02 FALSE 4
You may try with data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(a)), grouped by 'var1', we get a logical index for 'var2' elements that are greater than or equal to corresponding 'var2' elements for which 'var3' is TRUE and subset the dataset .SD.
library(data.table)
setDT(a)[,.SD[var2 >= var2[var3]], var1]
# var1 var2 var3 sequ
#1: patientA 2015-01-04 TRUE 2
#2: patientA 2015-02-02 FALSE 3
#3: patientB 2015-02-06 FALSE 1
#4: patientB 2015-01-07 TRUE 3
#5: patientB 2015-04-02 FALSE 4
An option using base R (assuming that the data is ordered by 'var1')
a[with(a, var2>=rep(var2[var3], table(var1))),]
# var1 var2 var3 sequ
#2 patientA 2015-01-04 TRUE 2
#3 patientA 2015-02-02 FALSE 3
#4 patientB 2015-02-06 FALSE 1
#6 patientB 2015-01-07 TRUE 3
#7 patientB 2015-04-02 FALSE 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With