I'm struggling on how can I create a subsample of a dataframe using just the first positive test basing on the date. I'll show a toy example. Suppose I have the folowing; <pre class="prettyprint"><code>df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"), test1 = c(1, 1, 0, 0, 1, 0), test2 = c(0, 1, 0, 1, 0, 0), test3 = c(0, 0, 1, 0, 0, 1), date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df #guy test1 test2 test3 date #1 A 1 0 0 1999-10-20 #2 B 1 1 0 1999-10-21 #3 A 0 0 1 1999-10-22 #4 B 0 1 0 1999-10-23 #5 C 1 0 0 1999-10-24 #6 C 0 0 1 1999-10-25 </code></pre> Now, I want to filter, selecting just the first positive test, (i.e <code>test1|test2|test3 = 1</code>) based on the oldest <code>date</code>. In my example I'd get the following: <pre class="prettyprint"><code> #guy test1 test2 test3 date #1 A 1 0 0 1999-10-20 #2 B 1 1 0 1999-10-21 #3 C 1 0 0 1999-10-24 </code></pre> Data frame: <pre class="prettyprint"><code>df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"), test1 = c(1, 1, 0, 0, 1, 0), test2 = c(0, 1, 0, 1, 0, 0), test3 = c(0, 0, 1, 0, 0, 1), date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df </code></pre> Any hint on how can I do that?

A base R option using <code>subset</code> + <code>ave</code> + <code>max.col</code> <pre class="prettyprint"><code>subset( df, as.logical( ave( max.col(df[grepl("test\\d+", names(df))], "first"), guy, FUN = function(x) x == min(x) ) ) & (test1|test2|test3) ) </code></pre> which gives <pre class="prettyprint"><code> guy test1 test2 test3 date 1 A 1 0 0 1999-10-20 2 B 1 1 0 1999-10-21 5 C 1 0 0 1999-10-24 </code></pre>

Selecting the first positive event

Tags:

database

select

dataframe

r

filter

I'm struggling on how can I create a subsample of a dataframe using just the first positive test basing on the date. I'll show a toy example. Suppose I have the folowing;

df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
  test1 = c(1, 1, 0, 0, 1, 0),
                test2 = c(0, 1, 0, 1, 0, 0),
                test3 = c(0, 0, 1, 0, 0, 1),
                date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df
   #guy test1 test2 test3       date
#1   A     1     0     0 1999-10-20
#2   B     1     1     0 1999-10-21
#3   A     0     0     1 1999-10-22
#4   B     0     1     0 1999-10-23
#5   C     1     0     0 1999-10-24
#6   C     0     0     1 1999-10-25

Now, I want to filter, selecting just the first positive test, (i.e test1|test2|test3 = 1) based on the oldest date. In my example I'd get the following:

   #guy test1 test2 test3       date
#1   A     1     0     0 1999-10-20
#2   B     1     1     0 1999-10-21
#3   C     1     0     0 1999-10-24

Data frame:

df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
  test1 = c(1, 1, 0, 0, 1, 0),
                test2 = c(0, 1, 0, 1, 0, 0),
                test3 = c(0, 0, 1, 0, 0, 1),
                date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df

Any hint on how can I do that?

650

asked Oct 10 '20 14:10

DR15

2 Answers

And using dplyr::top_n another option would be:

df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
                test1 = c(1, 1, 0, 0, 1, 0),
                test2 = c(0, 1, 0, 1, 0, 0),
                test3 = c(0, 0, 1, 0, 0, 1),
                date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')))

library(dplyr)

df %>% 
  filter(test1 | test2 | test3) %>% 
  group_by(guy) %>% 
  top_n(-1, date)
#> # A tibble: 3 x 5
#> # Groups:   guy [3]
#>   guy   test1 test2 test3 date      
#>   <chr> <dbl> <dbl> <dbl> <date>    
#> 1 A         1     0     0 1999-10-20
#> 2 B         1     1     0 1999-10-21
#> 3 C         1     0     0 1999-10-24

191

answered Sep 25 '22 01:09

stefan

A base R option using subset + ave + max.col

subset(
  df,
  as.logical(
    ave(
      max.col(df[grepl("test\\d+", names(df))], "first"),
      guy,
      FUN = function(x) x == min(x)
    )
  ) & (test1|test2|test3)
)

which gives

  guy test1 test2 test3       date
1   A     1     0     0 1999-10-20
2   B     1     1     0 1999-10-21
5   C     1     0     0 1999-10-24

answered Sep 25 '22 01:09

ThomasIsCoding

Related questions
                            
                                Counting new values not occuring earlier and not occuring in last group
                            
                                R: calculate the number of occurrences of a specific event in a specified time future
                            
                                discretizing viridis ggplot color scale
                            
                                In R how do I find whether an integer is divisible by a number?
                            
                                dplyr for rowwise quantiles
                            
                                How to make gap between x and y axis and protruded ticks in ggplot2
                            
                                Highlight a single "bar" in ggplot
                            
                                Pandas assigning random string to each group as new column
                            
                                Write multiple data frames to csv-file using purrr::map [duplicate]
                            
                                how to scrape all files in a catalog series from the national archives (archives.gov) with R
                            
                                Mapping dates to the viridis colour scale in ggplot2
                            
                                Concatenate unique strings after groupby in R
                            
                                How can I change the labels of these buttons in DT::Datatable in R and change collors of rows?
                            
                                When should I use "which" for subsetting?
                            
                                Difference between sort(), rank(), and order() [duplicate]
                            
                                How to replace certain values in a specific rows and columns with NA in R?
                            
                                Calculating sequences based on summary counts
                            
                                How to subset a vector inside list of list
                            
                                Load an RDS file from the web (i.e. a url) directly into R?
                            
                                How to subset dataframe on lowercase values in multiple columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With