I have data for one patient in R, which shows dates when they tested positive for a certain condition. The data looks like this: <pre class="prettyprint"><code> date positive 2005-02-22 yes 2005-04-26 no 2005-08-02 yes 2005-10-04 no 2005-12-06 yes 2006-03-14 no 2006-06-06 no 2006-09-12 yes 2006-12-19 yes 2007-03-27 yes </code></pre> Now I introduce a new definition. The patient's condition is defined as "chronic positive" if "current test is positive, and >=50% of tests in the prior 365 days were positive". So I want to create an output dataset that tells me whether the patient was chronically positive at each date. For example, the output should look like this (e.g. on 2006-09-12, they are "positive" but not "chronic positive" because 3 out of 4 visits in the previous 365 days were negative): <pre class="prettyprint"><code> date positive chronic 2005-02-22 yes no 2005-04-26 no no 2005-08-02 yes yes 2005-10-04 no no 2005-12-06 yes yes 2006-03-14 no no 2006-06-06 no no 2006-09-12 yes no 2006-12-19 yes no 2007-03-27 yes yes </code></pre> How can I do this? At each row of interest, I need to be able to look at previous rows (within the last 365 days) and assess what proportion of them were positive. I think I could use a combination of the <code>lead</code>/<code>lag</code> functions and <code>dplyr</code>, but I would appreciate an example of how this can be done. The original data can be reproduced with: <pre class="prettyprint"><code>dat <- structure(list(date = structure(c(12836, 12899, 12997, 13060, 13123, 13221, 13305, 13403, 13501, 13599), class = "Date"), positive = c("yes", "no", "yes", "no", "yes", "no", "no", "yes", "yes", "yes")), row.names = c(NA, 10L), class = "data.frame") </code></pre>

Another option using non-equi join in <code>data.table</code>: <pre class="prettyprint"><code>library(data.table) setDT(dat)[, yrago := date - 365L] dat[, chronic := fifelse( .SD[.SD, on=.(date>=yrago, date<date), by=.EACHI, .N>0 & i.positive=="yes" & sum(x.positive=="yes")/.N >= 0.5]$V1, "yes", "no") ] dat[, yrago := NULL][] </code></pre> output: <pre class="prettyprint"><code> date positive chronic 1: 2005-02-22 yes no 2: 2005-04-26 no no 3: 2005-08-02 yes yes 4: 2005-10-04 no no 5: 2005-12-06 yes yes 6: 2006-03-14 no no 7: 2006-06-06 no no 8: 2006-09-12 yes no 9: 2006-12-19 yes no 10: 2007-03-27 yes yes </code></pre>

You can also use this solution in case you don't want to use rolling functions: <pre class="prettyprint"><code>library(dplyr) library(purrr) library(lubridate) map(df %>% filter(positive == "yes") %>% pull(date), ~ df %>% filter(date %within% interval(.x - days(365), .x))) %>% map_dfr(~ .x %>% summarise(date = last(date), chronic = (sum(positive == "yes")-1)/ (n()-1) >= 0.5)) %>% right_join(df, by = "date") %>% arrange(date) %>% mutate(chronic = if_else(is.na(chronic) | !chronic, "no", "yes")) # A tibble: 10 x 3 date chronic positive <chr> <chr> <chr> 1 2005-02-22 no yes 2 2005-04-26 no no 3 2005-08-02 yes yes 4 2005-10-04 no no 5 2005-12-06 yes yes 6 2006-03-14 no no 7 2006-06-06 no no 8 2006-09-12 no yes 9 2006-12-19 no yes 10 2007-03-27 yes yes </code></pre>

Here is one way - <pre class="prettyprint"><code>library(dplyr) library(purrr) dat %>% mutate(chronic = map_chr(row_number(), ~{ inds <- between(date, date[.x] - 365, date[.x] - 1) if(positive[.x] == "yes" && any(inds) && mean(positive[inds] == 'yes') >= 0.5) 'yes' else 'no' })) # date positive chronic #1 2005-02-22 yes no #2 2005-04-26 no no #3 2005-08-02 yes yes #4 2005-10-04 no no #5 2005-12-06 yes yes #6 2006-03-14 no no #7 2006-06-06 no no #8 2006-09-12 yes no #9 2006-12-19 yes no #10 2007-03-27 yes yes </code></pre>

Mutate new column based on moving window of fixed date interval size, in R

Tags:

r

conditional-statements

dplyr

rolling-computation

lag

I have data for one patient in R, which shows dates when they tested positive for a certain condition. The data looks like this:

      date  positive
2005-02-22      yes
2005-04-26       no
2005-08-02      yes
2005-10-04       no
2005-12-06      yes
2006-03-14       no
2006-06-06       no
2006-09-12      yes
2006-12-19      yes
2007-03-27      yes

Now I introduce a new definition. The patient's condition is defined as "chronic positive" if "current test is positive, and >=50% of tests in the prior 365 days were positive". So I want to create an output dataset that tells me whether the patient was chronically positive at each date. For example, the output should look like this (e.g. on 2006-09-12, they are "positive" but not "chronic positive" because 3 out of 4 visits in the previous 365 days were negative):

      date  positive  chronic
2005-02-22      yes        no
2005-04-26       no        no
2005-08-02      yes       yes
2005-10-04       no        no
2005-12-06      yes       yes
2006-03-14       no        no
2006-06-06       no        no
2006-09-12      yes        no
2006-12-19      yes        no
2007-03-27      yes       yes

How can I do this? At each row of interest, I need to be able to look at previous rows (within the last 365 days) and assess what proportion of them were positive. I think I could use a combination of the lead/lag functions and dplyr, but I would appreciate an example of how this can be done.

The original data can be reproduced with:

dat <- structure(list(date = structure(c(12836, 12899, 12997, 13060, 13123, 13221, 13305, 13403, 13501, 13599), class = "Date"), 
                      positive = c("yes", "no", "yes", "no", "yes", "no", "no", "yes", "yes", "yes")), 
                 row.names = c(NA, 10L), class = "data.frame")

772

asked Jul 10 '21 04:07

bob

4 Answers

You can make use slider library for such rolling computation. Syntax explanation -

slide_index_lgl works on a vector .x and an index .i simultaneously and produces a logical vector output.
.x is used as positive vector
.i is used as date vector
.before and .after are self explanatory (previous 365 days included and current day excluded)
.f is simple where test positivity in previous 365 days is checked
this output is combined with another condition i.e. positive == 'yes' I used this formula (sum(.x == 'yes') / length(.x)) >= 0.5
1 is added to this logical output, giving us 1 for FALSE and 2 for TRUE
This complete output is used as an index for output vector c('No', 'Yes') so that you'll get YesforTRUEandNoforFALSE`

library(tidyverse)

df <- read.table(header = TRUE, text = 'date  positive
2005-02-22      yes
2005-04-26       no
2005-08-02      yes
2005-10-04       no
2005-12-06      yes
2006-03-14       no
2006-06-06       no
2006-09-12      yes
2006-12-19      yes
2007-03-27      yes')

df$date <- as.Date(df$date)

library(slider)
library(lubridate)

df %>%
  mutate(chronic = c('No', "Yes")[1 + (positive == 'yes' &  slide_index_lgl(positive, date, 
                              ~ (sum(.x == 'yes') / length(.x)) >= 0.5  , 
                              .before = days(365), 
                              .after = days(-1)))])

#>          date positive chronic
#> 1  2005-02-22      yes    <NA>
#> 2  2005-04-26       no      No
#> 3  2005-08-02      yes     Yes
#> 4  2005-10-04       no      No
#> 5  2005-12-06      yes     Yes
#> 6  2006-03-14       no      No
#> 7  2006-06-06       no      No
#> 8  2006-09-12      yes      No
#> 9  2006-12-19      yes      No
#> 10 2007-03-27      yes     Yes

Alternative strategy using runner::runner() in baseR

dat <- structure(list(date = structure(c(12836, 12899, 12997, 13060, 13123, 13221, 13305, 13403, 13501, 13599), class = "Date"), 
                      positive = c("yes", "no", "yes", "no", "yes", "no", "no", "yes", "yes", "yes")), 
                 row.names = c(NA, 10L), class = "data.frame")

library(runner)

dat$chronic <- ifelse(runner(dat$positive, idx = dat$date, lag = '1 day',
                             k = '365 days',
                             f = \(.x) (sum(.x == 'yes')/length(.x)) >= 0.5) & dat$positive == 'yes', 'yes', 'no')
dat
#>          date positive chronic
#> 1  2005-02-22      yes    <NA>
#> 2  2005-04-26       no      no
#> 3  2005-08-02      yes     yes
#> 4  2005-10-04       no      no
#> 5  2005-12-06      yes     yes
#> 6  2006-03-14       no      no
#> 7  2006-06-06       no      no
#> 8  2006-09-12      yes      no
#> 9  2006-12-19      yes      no
#> 10 2007-03-27      yes     yes

110

answered Oct 21 '22 19:10

AnilGoyal

Another option using non-equi join in data.table:

library(data.table)
setDT(dat)[, yrago := date - 365L]
dat[, chronic := fifelse(
    .SD[.SD, on=.(date>=yrago, date<date), 
        by=.EACHI, .N>0 & i.positive=="yes" & sum(x.positive=="yes")/.N >= 0.5]$V1,
    "yes", "no")
]
dat[, yrago := NULL][]

output:

          date positive chronic
 1: 2005-02-22      yes      no
 2: 2005-04-26       no      no
 3: 2005-08-02      yes     yes
 4: 2005-10-04       no      no
 5: 2005-12-06      yes     yes
 6: 2006-03-14       no      no
 7: 2006-06-06       no      no
 8: 2006-09-12      yes      no
 9: 2006-12-19      yes      no
10: 2007-03-27      yes     yes

answered Oct 21 '22 18:10

chinsoon12

You can also use this solution in case you don't want to use rolling functions:

library(dplyr)
library(purrr)
library(lubridate)

map(df %>% 
      filter(positive == "yes") %>% 
      pull(date), ~ df %>% filter(date %within% interval(.x - days(365), .x))) %>% 
  map_dfr(~ .x %>% 
        summarise(date = last(date),
                  chronic = (sum(positive == "yes")-1)/ (n()-1) >= 0.5)) %>%
  right_join(df, by = "date") %>%
  arrange(date) %>%
  mutate(chronic = if_else(is.na(chronic) | !chronic, "no", "yes"))

# A tibble: 10 x 3
   date       chronic positive
   <chr>      <chr>   <chr>   
 1 2005-02-22 no      yes     
 2 2005-04-26 no      no      
 3 2005-08-02 yes     yes     
 4 2005-10-04 no      no      
 5 2005-12-06 yes     yes     
 6 2006-03-14 no      no      
 7 2006-06-06 no      no      
 8 2006-09-12 no      yes     
 9 2006-12-19 no      yes     
10 2007-03-27 yes     yes

answered Oct 21 '22 20:10

Anoushiravan R

Here is one way -

library(dplyr)
library(purrr)

dat %>%
  mutate(chronic = map_chr(row_number(), ~{
    inds <- between(date, date[.x] - 365, date[.x] - 1)
    if(positive[.x] == "yes" && any(inds) && mean(positive[inds] == 'yes') >= 0.5) 'yes' else 'no'
    }))

#         date positive chronic
#1  2005-02-22      yes      no
#2  2005-04-26       no      no
#3  2005-08-02      yes     yes
#4  2005-10-04       no      no
#5  2005-12-06      yes     yes
#6  2006-03-14       no      no
#7  2006-06-06       no      no
#8  2006-09-12      yes      no
#9  2006-12-19      yes      no
#10 2007-03-27      yes     yes

answered Oct 21 '22 19:10

Ronak Shah

Related questions
                            
                                How to compose a list of functions
                            
                                ggplot() scaling with scale::percent_format() producing strange results
                            
                                Plot y = mx + c with ggplot
                            
                                Blogdown kable tables formatting (ugly)
                            
                                Handling empty strings in string detection
                            
                                R shiny dynamic UI in insertUI
                            
                                How to convert a numeric value into a Date value
                            
                                How to filter an R simple features collection using sf methods like st_intersects()?
                            
                                R return true or false per row if string contains any of a list of words
                            
                                How to find the number of times row elements switch from negative to positive (cycles) for each factor level
                            
                                Replacement of plyr::cbind.fill in dplyr?
                            
                                Left-adjust (hjust = 0) vertical x axis labels on facets with free scale
                            
                                How to group rows and get their cell associations layed out in a list form in r?
                            
                                How to establish if the dates in a column are unique?
                            
                                Cumulative product of (1-previous_record)*current_record
                            
                                zsh: command not found: R on terminal using Big Sur Mac
                            
                                How to identify row that matches vector
                            
                                R repeat in column based on value in row
                            
                                R: pass multiple arguments to accumulate/reduce
                            
                                Last observation added forward

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With