Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mutate new column based on moving window of fixed date interval size, in R

I have data for one patient in R, which shows dates when they tested positive for a certain condition. The data looks like this:

      date  positive
2005-02-22      yes
2005-04-26       no
2005-08-02      yes
2005-10-04       no
2005-12-06      yes
2006-03-14       no
2006-06-06       no
2006-09-12      yes
2006-12-19      yes
2007-03-27      yes

Now I introduce a new definition. The patient's condition is defined as "chronic positive" if "current test is positive, and >=50% of tests in the prior 365 days were positive". So I want to create an output dataset that tells me whether the patient was chronically positive at each date. For example, the output should look like this (e.g. on 2006-09-12, they are "positive" but not "chronic positive" because 3 out of 4 visits in the previous 365 days were negative):

      date  positive  chronic
2005-02-22      yes        no
2005-04-26       no        no
2005-08-02      yes       yes
2005-10-04       no        no
2005-12-06      yes       yes
2006-03-14       no        no
2006-06-06       no        no
2006-09-12      yes        no
2006-12-19      yes        no
2007-03-27      yes       yes

How can I do this? At each row of interest, I need to be able to look at previous rows (within the last 365 days) and assess what proportion of them were positive. I think I could use a combination of the lead/lag functions and dplyr, but I would appreciate an example of how this can be done.

The original data can be reproduced with:

dat <- structure(list(date = structure(c(12836, 12899, 12997, 13060, 13123, 13221, 13305, 13403, 13501, 13599), class = "Date"), 
                      positive = c("yes", "no", "yes", "no", "yes", "no", "no", "yes", "yes", "yes")), 
                 row.names = c(NA, 10L), class = "data.frame")
like image 772
bob Avatar asked Jul 10 '21 04:07

bob


People also ask

How do you mutate a variable in R?

To use mutate in R, all you need to do is call the function, specify the dataframe, and specify the name-value pair for the new variable you want to create. The explanation I just gave is pretty straightforward, but to make it more concrete, let’s work with some actual data.

How to add a new column to a data frame using mutate?

How to add a new column to a data frame using mutate in R? How to add a new column to a data frame using mutate in R? The mutate function of dplyr package in R can help us to add a new column to a data frame and the benefit of using mutate is that we can decide the position of the new column during the addition.

What is the difference between mutate and transmute in R?

Source: R/manip.r. mutate.Rd. mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. Both functions preserve the number of rows of the input. New variables overwrite existing variables of the same name.

How to add a new column to a Dataframe in R?

The mutate function of dplyr package in R can help us to add a new column to a data frame and the benefit of using mutate is that we can decide the position of the new column during the addition. For example, if we have a data frame called df that contains three columns say x, y, a then we can add a new column say z after y using mutate function.


4 Answers

You can make use slider library for such rolling computation. Syntax explanation -

  • slide_index_lgl works on a vector .x and an index .i simultaneously and produces a logical vector output.
  • .x is used as positive vector
  • .i is used as date vector
  • .before and .after are self explanatory (previous 365 days included and current day excluded)
  • .f is simple where test positivity in previous 365 days is checked
  • this output is combined with another condition i.e. positive == 'yes' I used this formula (sum(.x == 'yes') / length(.x)) >= 0.5
  • 1 is added to this logical output, giving us 1 for FALSE and 2 for TRUE
  • This complete output is used as an index for output vector c('No', 'Yes') so that you'll get YesforTRUEandNoforFALSE`
library(tidyverse)

df <- read.table(header = TRUE, text = 'date  positive
2005-02-22      yes
2005-04-26       no
2005-08-02      yes
2005-10-04       no
2005-12-06      yes
2006-03-14       no
2006-06-06       no
2006-09-12      yes
2006-12-19      yes
2007-03-27      yes')

df$date <- as.Date(df$date)

library(slider)
library(lubridate)

df %>%
  mutate(chronic = c('No', "Yes")[1 + (positive == 'yes' &  slide_index_lgl(positive, date, 
                              ~ (sum(.x == 'yes') / length(.x)) >= 0.5  , 
                              .before = days(365), 
                              .after = days(-1)))])

#>          date positive chronic
#> 1  2005-02-22      yes    <NA>
#> 2  2005-04-26       no      No
#> 3  2005-08-02      yes     Yes
#> 4  2005-10-04       no      No
#> 5  2005-12-06      yes     Yes
#> 6  2006-03-14       no      No
#> 7  2006-06-06       no      No
#> 8  2006-09-12      yes      No
#> 9  2006-12-19      yes      No
#> 10 2007-03-27      yes     Yes

Alternative strategy using runner::runner() in baseR

dat <- structure(list(date = structure(c(12836, 12899, 12997, 13060, 13123, 13221, 13305, 13403, 13501, 13599), class = "Date"), 
                      positive = c("yes", "no", "yes", "no", "yes", "no", "no", "yes", "yes", "yes")), 
                 row.names = c(NA, 10L), class = "data.frame")

library(runner)

dat$chronic <- ifelse(runner(dat$positive, idx = dat$date, lag = '1 day',
                             k = '365 days',
                             f = \(.x) (sum(.x == 'yes')/length(.x)) >= 0.5) & dat$positive == 'yes', 'yes', 'no')
dat
#>          date positive chronic
#> 1  2005-02-22      yes    <NA>
#> 2  2005-04-26       no      no
#> 3  2005-08-02      yes     yes
#> 4  2005-10-04       no      no
#> 5  2005-12-06      yes     yes
#> 6  2006-03-14       no      no
#> 7  2006-06-06       no      no
#> 8  2006-09-12      yes      no
#> 9  2006-12-19      yes      no
#> 10 2007-03-27      yes     yes
like image 110
AnilGoyal Avatar answered Oct 21 '22 19:10

AnilGoyal


Another option using non-equi join in data.table:

library(data.table)
setDT(dat)[, yrago := date - 365L]
dat[, chronic := fifelse(
    .SD[.SD, on=.(date>=yrago, date<date), 
        by=.EACHI, .N>0 & i.positive=="yes" & sum(x.positive=="yes")/.N >= 0.5]$V1,
    "yes", "no")
]
dat[, yrago := NULL][]

output:

          date positive chronic
 1: 2005-02-22      yes      no
 2: 2005-04-26       no      no
 3: 2005-08-02      yes     yes
 4: 2005-10-04       no      no
 5: 2005-12-06      yes     yes
 6: 2006-03-14       no      no
 7: 2006-06-06       no      no
 8: 2006-09-12      yes      no
 9: 2006-12-19      yes      no
10: 2007-03-27      yes     yes
like image 34
chinsoon12 Avatar answered Oct 21 '22 18:10

chinsoon12


You can also use this solution in case you don't want to use rolling functions:

library(dplyr)
library(purrr)
library(lubridate)

map(df %>% 
      filter(positive == "yes") %>% 
      pull(date), ~ df %>% filter(date %within% interval(.x - days(365), .x))) %>% 
  map_dfr(~ .x %>% 
        summarise(date = last(date),
                  chronic = (sum(positive == "yes")-1)/ (n()-1) >= 0.5)) %>%
  right_join(df, by = "date") %>%
  arrange(date) %>%
  mutate(chronic = if_else(is.na(chronic) | !chronic, "no", "yes"))

# A tibble: 10 x 3
   date       chronic positive
   <chr>      <chr>   <chr>   
 1 2005-02-22 no      yes     
 2 2005-04-26 no      no      
 3 2005-08-02 yes     yes     
 4 2005-10-04 no      no      
 5 2005-12-06 yes     yes     
 6 2006-03-14 no      no      
 7 2006-06-06 no      no      
 8 2006-09-12 no      yes     
 9 2006-12-19 no      yes     
10 2007-03-27 yes     yes 
like image 3
Anoushiravan R Avatar answered Oct 21 '22 20:10

Anoushiravan R


Here is one way -

library(dplyr)
library(purrr)

dat %>%
  mutate(chronic = map_chr(row_number(), ~{
    inds <- between(date, date[.x] - 365, date[.x] - 1)
    if(positive[.x] == "yes" && any(inds) && mean(positive[inds] == 'yes') >= 0.5) 'yes' else 'no'
    }))

#         date positive chronic
#1  2005-02-22      yes      no
#2  2005-04-26       no      no
#3  2005-08-02      yes     yes
#4  2005-10-04       no      no
#5  2005-12-06      yes     yes
#6  2006-03-14       no      no
#7  2006-06-06       no      no
#8  2006-09-12      yes      no
#9  2006-12-19      yes      no
#10 2007-03-27      yes     yes
like image 2
Ronak Shah Avatar answered Oct 21 '22 19:10

Ronak Shah