Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select value in first column for which condition in second column is fulfilled

Tags:

r

I'm working with a data from patient visits that indicates at which dates patients received dosages of various drugs. Here's a sample (the numbers indicate the dosage):

patient <- c("patient1", "patient1", "patient1", "patient2", "patient2", "patient2")
date <- c(2010, 2011, 2012, 2010, 2013, 2018)
drug1 <- c(0, 1, 2, 0, 0, 3)
drug2 <- c(0, 0, 1, 2, 0, 3)
myinput <- data.frame(patient, date, drug1, drug2)

> myinput
   patient date drug1 drug2
1 patient1 2010     0     0
2 patient1 2011     1     0
3 patient1 2012     2     1
4 patient2 2010     0     2
5 patient2 2013     0     0
6 patient2 2018     3     3

I would like to identify the date on which treatment was begun with every drug:

patient <- c("patient1", "patient2")
startDrug1 <- c(2011, 2018)
startDrug2 <- c(2012, 2010)
myoutput <- data.frame(patient, startDrug1, startDrug2)

myoutput
   patient startDrug1 startDrug2
1 patient1       2011       2012
2 patient2       2018       2010

So I would like to obtain the first value in date (reading from top to bottom) for which drug1 or drug2 is > 0. To complicate matters, a drug might be discontinued and begun a second time (as is the case in drug2 for patient2), hence the point with reading from top to bottom.

I'm grateful for any pointers, as I'm a bit stumped by this. Thanks!

like image 546
Nereus Avatar asked Oct 29 '25 02:10

Nereus


2 Answers

You could do it with dplyr::summarize and using which() to index:

library(dplyr)

myinput %>%
  summarise(across(starts_with("drug"), ~ date[which(.x > 0)[1]],
                   .names = "start{.col}"),
            .by = patient)

Output:

#   patient  startdrug1 startdrug2
#   <chr>         <dbl>      <dbl>
# 1 patient1       2011       2012
# 2 patient2       2018       2010
like image 94
jpsmith Avatar answered Oct 31 '25 17:10

jpsmith


I would probably do this with two pivots - first longer to put the drug dosages all in one column, filter out the zeros and find the minimum dates by drug and patient combo. Then pivot wider again to put drugs back in the column:

library(tidyr)
library(dplyr)
patient <- c("patient1", "patient1", "patient1", "patient2", "patient2", "patient2")
date <- c(2010, 2011, 2012, 2010, 2013, 2018)
drug1 <- c(0, 1, 2, 0, 0, 3)
drug2 <- c(0, 0, 1, 2, 0, 3)
myinput <- data.frame(patient, date, drug1, drug2)
myinput %>% 
  pivot_longer(starts_with("drug"), names_to = "drug", values_to = "dosage") %>% 
  group_by(patient, drug) %>% 
  filter(dosage > 0) %>% 
  slice_min(date) %>% 
  select(-dosage) %>% 
  pivot_wider(names_from = "drug", values_from = "date", names_prefix = "start_")
#> # A tibble: 2 × 3
#> # Groups:   patient [2]
#>   patient  start_drug1 start_drug2
#>   <chr>          <dbl>       <dbl>
#> 1 patient1        2011        2012
#> 2 patient2        2018        2010

If you don't like the pivots, you could also do it with mutate() and summarise(). Simply replace the drug dosage values with dates if they are bigger than 0 and NA otherwise. Then summarise to get the minimum date for each patient for each drug.

myinput %>% 
  mutate(across(starts_with("drug"), ~ifelse(.x == 0, NA, date))) %>% 
  group_by(patient) %>% 
  summarise(across(starts_with("drug"), ~min(.x, na.rm=TRUE)))
#> # A tibble: 2 × 3
#>   patient  drug1 drug2
#>   <chr>    <dbl> <dbl>
#> 1 patient1  2011  2012
#> 2 patient2  2018  2010

Created on 2024-01-29 with reprex v2.0.2

like image 39
DaveArmstrong Avatar answered Oct 31 '25 17:10

DaveArmstrong



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!