I'm struggling on how can I create a subsample of a dataframe using just the first positive test basing on the date. I'll show a toy example. Suppose I have the folowing;
df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
test1 = c(1, 1, 0, 0, 1, 0),
test2 = c(0, 1, 0, 1, 0, 0),
test3 = c(0, 0, 1, 0, 0, 1),
date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df
#guy test1 test2 test3 date
#1 A 1 0 0 1999-10-20
#2 B 1 1 0 1999-10-21
#3 A 0 0 1 1999-10-22
#4 B 0 1 0 1999-10-23
#5 C 1 0 0 1999-10-24
#6 C 0 0 1 1999-10-25
Now, I want to filter, selecting just the first positive test, (i.e test1|test2|test3 = 1
) based on the oldest date
. In my example I'd get the following:
#guy test1 test2 test3 date
#1 A 1 0 0 1999-10-20
#2 B 1 1 0 1999-10-21
#3 C 1 0 0 1999-10-24
Data frame:
df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
test1 = c(1, 1, 0, 0, 1, 0),
test2 = c(0, 1, 0, 1, 0, 0),
test3 = c(0, 0, 1, 0, 0, 1),
date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df
Any hint on how can I do that?
It's often been said that the success of anything stems from good planning. So, if you're wondering what makes an event successful, then the best way to start is by making sure it is planned meticulously, comprehensively and accurately. And that means getting things moving well in advance.
And using dplyr::top_n
another option would be:
df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
test1 = c(1, 1, 0, 0, 1, 0),
test2 = c(0, 1, 0, 1, 0, 0),
test3 = c(0, 0, 1, 0, 0, 1),
date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')))
library(dplyr)
df %>%
filter(test1 | test2 | test3) %>%
group_by(guy) %>%
top_n(-1, date)
#> # A tibble: 3 x 5
#> # Groups: guy [3]
#> guy test1 test2 test3 date
#> <chr> <dbl> <dbl> <dbl> <date>
#> 1 A 1 0 0 1999-10-20
#> 2 B 1 1 0 1999-10-21
#> 3 C 1 0 0 1999-10-24
A base R option using subset
+ ave
+ max.col
subset(
df,
as.logical(
ave(
max.col(df[grepl("test\\d+", names(df))], "first"),
guy,
FUN = function(x) x == min(x)
)
) & (test1|test2|test3)
)
which gives
guy test1 test2 test3 date
1 A 1 0 0 1999-10-20
2 B 1 1 0 1999-10-21
5 C 1 0 0 1999-10-24
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With