I'm working with a data from patient visits that indicates at which dates patients received dosages of various drugs. Here's a sample (the numbers indicate the dosage):
patient <- c("patient1", "patient1", "patient1", "patient2", "patient2", "patient2")
date <- c(2010, 2011, 2012, 2010, 2013, 2018)
drug1 <- c(0, 1, 2, 0, 0, 3)
drug2 <- c(0, 0, 1, 2, 0, 3)
myinput <- data.frame(patient, date, drug1, drug2)
> myinput
patient date drug1 drug2
1 patient1 2010 0 0
2 patient1 2011 1 0
3 patient1 2012 2 1
4 patient2 2010 0 2
5 patient2 2013 0 0
6 patient2 2018 3 3
I would like to identify the date on which treatment was begun with every drug:
patient <- c("patient1", "patient2")
startDrug1 <- c(2011, 2018)
startDrug2 <- c(2012, 2010)
myoutput <- data.frame(patient, startDrug1, startDrug2)
myoutput
patient startDrug1 startDrug2
1 patient1 2011 2012
2 patient2 2018 2010
So I would like to obtain the first value in date (reading from top to bottom) for which drug1 or drug2 is > 0. To complicate matters, a drug might be discontinued and begun a second time (as is the case in drug2 for patient2), hence the point with reading from top to bottom.
I'm grateful for any pointers, as I'm a bit stumped by this. Thanks!
You could do it with dplyr::summarize and using which() to index:
library(dplyr)
myinput %>%
summarise(across(starts_with("drug"), ~ date[which(.x > 0)[1]],
.names = "start{.col}"),
.by = patient)
Output:
# patient startdrug1 startdrug2
# <chr> <dbl> <dbl>
# 1 patient1 2011 2012
# 2 patient2 2018 2010
I would probably do this with two pivots - first longer to put the drug dosages all in one column, filter out the zeros and find the minimum dates by drug and patient combo. Then pivot wider again to put drugs back in the column:
library(tidyr)
library(dplyr)
patient <- c("patient1", "patient1", "patient1", "patient2", "patient2", "patient2")
date <- c(2010, 2011, 2012, 2010, 2013, 2018)
drug1 <- c(0, 1, 2, 0, 0, 3)
drug2 <- c(0, 0, 1, 2, 0, 3)
myinput <- data.frame(patient, date, drug1, drug2)
myinput %>%
pivot_longer(starts_with("drug"), names_to = "drug", values_to = "dosage") %>%
group_by(patient, drug) %>%
filter(dosage > 0) %>%
slice_min(date) %>%
select(-dosage) %>%
pivot_wider(names_from = "drug", values_from = "date", names_prefix = "start_")
#> # A tibble: 2 × 3
#> # Groups: patient [2]
#> patient start_drug1 start_drug2
#> <chr> <dbl> <dbl>
#> 1 patient1 2011 2012
#> 2 patient2 2018 2010
If you don't like the pivots, you could also do it with mutate() and summarise(). Simply replace the drug dosage values with dates if they are bigger than 0 and NA otherwise. Then summarise to get the minimum date for each patient for each drug.
myinput %>%
mutate(across(starts_with("drug"), ~ifelse(.x == 0, NA, date))) %>%
group_by(patient) %>%
summarise(across(starts_with("drug"), ~min(.x, na.rm=TRUE)))
#> # A tibble: 2 × 3
#> patient drug1 drug2
#> <chr> <dbl> <dbl>
#> 1 patient1 2011 2012
#> 2 patient2 2018 2010
Created on 2024-01-29 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With