I'm working with a data from patient visits that indicates at which dates patients received dosages of various drugs. Here's a sample (the numbers indicate the dosage):
patient <- c("patient1", "patient1", "patient1", "patient2", "patient2", "patient2")
date <- c(2010, 2011, 2012, 2010, 2013, 2018)
drug1 <- c(0, 1, 2, 0, 0, 3)
drug2 <- c(0, 0, 1, 2, 0, 3)
myinput <- data.frame(patient, date, drug1, drug2)
> myinput
   patient date drug1 drug2
1 patient1 2010     0     0
2 patient1 2011     1     0
3 patient1 2012     2     1
4 patient2 2010     0     2
5 patient2 2013     0     0
6 patient2 2018     3     3
I would like to identify the date on which treatment was begun with every drug:
patient <- c("patient1", "patient2")
startDrug1 <- c(2011, 2018)
startDrug2 <- c(2012, 2010)
myoutput <- data.frame(patient, startDrug1, startDrug2)
myoutput
   patient startDrug1 startDrug2
1 patient1       2011       2012
2 patient2       2018       2010
So I would like to obtain the first value in date (reading from top to bottom) for which drug1 or drug2 is > 0. To complicate matters, a drug might be discontinued and begun a second time (as is the case in drug2 for patient2), hence the point with reading from top to bottom.
I'm grateful for any pointers, as I'm a bit stumped by this. Thanks!
You could do it with dplyr::summarize and using which() to index:
library(dplyr)
myinput %>%
  summarise(across(starts_with("drug"), ~ date[which(.x > 0)[1]],
                   .names = "start{.col}"),
            .by = patient)
Output:
#   patient  startdrug1 startdrug2
#   <chr>         <dbl>      <dbl>
# 1 patient1       2011       2012
# 2 patient2       2018       2010
                        I would probably do this with two pivots - first longer to put the drug dosages all in one column, filter out the zeros and find the minimum dates by drug and patient combo. Then pivot wider again to put drugs back in the column:
library(tidyr)
library(dplyr)
patient <- c("patient1", "patient1", "patient1", "patient2", "patient2", "patient2")
date <- c(2010, 2011, 2012, 2010, 2013, 2018)
drug1 <- c(0, 1, 2, 0, 0, 3)
drug2 <- c(0, 0, 1, 2, 0, 3)
myinput <- data.frame(patient, date, drug1, drug2)
myinput %>% 
  pivot_longer(starts_with("drug"), names_to = "drug", values_to = "dosage") %>% 
  group_by(patient, drug) %>% 
  filter(dosage > 0) %>% 
  slice_min(date) %>% 
  select(-dosage) %>% 
  pivot_wider(names_from = "drug", values_from = "date", names_prefix = "start_")
#> # A tibble: 2 × 3
#> # Groups:   patient [2]
#>   patient  start_drug1 start_drug2
#>   <chr>          <dbl>       <dbl>
#> 1 patient1        2011        2012
#> 2 patient2        2018        2010
If you don't like the pivots, you could also do it with mutate() and summarise().  Simply replace the drug dosage values with dates if they are bigger than 0 and NA otherwise.  Then summarise to get the minimum date for each patient for each drug.
myinput %>% 
  mutate(across(starts_with("drug"), ~ifelse(.x == 0, NA, date))) %>% 
  group_by(patient) %>% 
  summarise(across(starts_with("drug"), ~min(.x, na.rm=TRUE)))
#> # A tibble: 2 × 3
#>   patient  drug1 drug2
#>   <chr>    <dbl> <dbl>
#> 1 patient1  2011  2012
#> 2 patient2  2018  2010
Created on 2024-01-29 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With