Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting the first positive event

I'm struggling on how can I create a subsample of a dataframe using just the first positive test basing on the date. I'll show a toy example. Suppose I have the folowing;

df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
  test1 = c(1, 1, 0, 0, 1, 0),
                test2 = c(0, 1, 0, 1, 0, 0),
                test3 = c(0, 0, 1, 0, 0, 1),
                date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df
   #guy test1 test2 test3       date
#1   A     1     0     0 1999-10-20
#2   B     1     1     0 1999-10-21
#3   A     0     0     1 1999-10-22
#4   B     0     1     0 1999-10-23
#5   C     1     0     0 1999-10-24
#6   C     0     0     1 1999-10-25

Now, I want to filter, selecting just the first positive test, (i.e test1|test2|test3 = 1) based on the oldest date. In my example I'd get the following:

   #guy test1 test2 test3       date
#1   A     1     0     0 1999-10-20
#2   B     1     1     0 1999-10-21
#3   C     1     0     0 1999-10-24

Data frame:

df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
  test1 = c(1, 1, 0, 0, 1, 0),
                test2 = c(0, 1, 0, 1, 0, 0),
                test3 = c(0, 0, 1, 0, 0, 1),
                date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df

Any hint on how can I do that?

like image 650
DR15 Avatar asked Oct 10 '20 14:10

DR15


People also ask

What defines a successful event?

It's often been said that the success of anything stems from good planning. So, if you're wondering what makes an event successful, then the best way to start is by making sure it is planned meticulously, comprehensively and accurately. And that means getting things moving well in advance.


2 Answers

And using dplyr::top_n another option would be:

df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
                test1 = c(1, 1, 0, 0, 1, 0),
                test2 = c(0, 1, 0, 1, 0, 0),
                test3 = c(0, 0, 1, 0, 0, 1),
                date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')))

library(dplyr)

df %>% 
  filter(test1 | test2 | test3) %>% 
  group_by(guy) %>% 
  top_n(-1, date)
#> # A tibble: 3 x 5
#> # Groups:   guy [3]
#>   guy   test1 test2 test3 date      
#>   <chr> <dbl> <dbl> <dbl> <date>    
#> 1 A         1     0     0 1999-10-20
#> 2 B         1     1     0 1999-10-21
#> 3 C         1     0     0 1999-10-24
like image 191
stefan Avatar answered Sep 25 '22 01:09

stefan


A base R option using subset + ave + max.col

subset(
  df,
  as.logical(
    ave(
      max.col(df[grepl("test\\d+", names(df))], "first"),
      guy,
      FUN = function(x) x == min(x)
    )
  ) & (test1|test2|test3)
)

which gives

  guy test1 test2 test3       date
1   A     1     0     0 1999-10-20
2   B     1     1     0 1999-10-21
5   C     1     0     0 1999-10-24
like image 23
ThomasIsCoding Avatar answered Sep 25 '22 01:09

ThomasIsCoding