How to create a variable that is the sum of consecutive rows within a given time frame and by id

Question

I am trying to implement a sum of consecutive values that fall with 365 days of each other, grouped by a unique identifier in R. For example, for date 1 of a particular ID, we would add dates 2,3,4 (fall within 365 days) of the same ID to get a total cost for date 1. Then for date 2, we would add 3 and 4 to get that total cost and so on. I have tried several rolling sums (R dplyr rolling sum) and similar solutions from dplyr that take sum consecutive values (Calculate sum of a column if the difference between consecutive rows meets a condition) with certain constraints but could not get the code to distinguish the amount of days. I have included a sample dataset with a solution dataset for an example that I am looking for.

Starting Data Set

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,5000,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df2<-data.frame(ID,admitdt,cost,cost365)

  ID    admitdt  cost
1   1 2014-10-19  2000
2   1 2014-10-24 14077
3   1 2015-01-31  5000
4   1 2016-01-20   200
5   1 2017-06-30   560
6   1 2017-07-17  5000
7   2 2015-04-21   888
8   2 2015-04-22  5959
9   2 2015-05-04  1819
10  2 2015-07-25  7508
11  3 2014-11-11  6406

Solution:

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,500,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df2<-data.frame(ID,admitdt,cost,cost365)
  ID    admitdt  cost cost365
1   1 2014-10-19  2000   21077
2   1 2014-10-24 14077   19077
3   1 2015-01-31  5000    5200
4   1 2016-01-20   200     200
5   1 2017-06-30   560    5560
6   1 2017-07-17  5000    5000
7   2 2015-04-21   888   16174
8   2 2015-04-22  5959   15286
9   2 2015-05-04  1819    9327
10  2 2015-07-25  7508    7508
11  3 2014-11-11  6406    6406

AnilGoyal · Accepted Answer

I am giving 2 methods each in slider and runner. Out of these I like slider because of its clarity of syntax. Neverthess, the strategy in both are nearly same,

date column will act as index in both.
slider gives more control becuase it has .before and .after agruments, which in the instant case you need only after = days(365) (that is integrated with lubridate)
in runner k is always backwards so I used -364 there.
Rest is clear. Still If any further clarification is needed, do ask.

In slider you can do

library(tidyverse)

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,5000,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df<-data.frame(ID,admitdt,cost)

df
#>    ID    admitdt  cost
#> 1   1 2014-10-19  2000
#> 2   1 2014-10-24 14077
#> 3   1 2015-01-31  5000
#> 4   1 2016-01-20   200
#> 5   1 2017-06-30   560
#> 6   1 2017-07-17  5000
#> 7   2 2015-04-21   888
#> 8   2 2015-04-22  5959
#> 9   2 2015-05-04  1819
#> 10  2 2015-07-25  7508
#> 11  3 2014-11-11  6406

library(slider)
library(lubridate)

df %>% group_by(ID) %>%
  mutate(admitdt = as.Date(admitdt),
              cost365 = slider::slide_index_sum(x = cost,
                                                i = admitdt,
                                                after = days(365)))
#> # A tibble: 11 x 4
#> # Groups:   ID [3]
#>       ID admitdt     cost cost365
#>    <dbl> <date>     <dbl>   <dbl>
#>  1     1 2014-10-19  2000   21077
#>  2     1 2014-10-24 14077   19077
#>  3     1 2015-01-31  5000    5200
#>  4     1 2016-01-20   200     200
#>  5     1 2017-06-30   560    5560
#>  6     1 2017-07-17  5000    5000
#>  7     2 2015-04-21   888   16174
#>  8     2 2015-04-22  5959   15286
#>  9     2 2015-05-04  1819    9327
#> 10     2 2015-07-25  7508    7508
#> 11     3 2014-11-11  6406    6406

Or in runner

library(dplyr, warn.conflicts = F)

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,5000,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df<-data.frame(ID,admitdt,cost)

library(runner)

df %>% group_by(ID) %>%
  mutate(admitdt = as.Date(admitdt),
         cost365 = runner::sum_run(x = cost,
                                   idx = admitdt,
                                   k = 365,
                                   lag = -364))
#> # A tibble: 11 x 4
#> # Groups:   ID [3]
#>       ID admitdt     cost cost365
#>    <dbl> <date>     <dbl>   <dbl>
#>  1     1 2014-10-19  2000   21077
#>  2     1 2014-10-24 14077   19077
#>  3     1 2015-01-31  5000    5200
#>  4     1 2016-01-20   200     200
#>  5     1 2017-06-30   560    5560
#>  6     1 2017-07-17  5000    5000
#>  7     2 2015-04-21   888   16174
#>  8     2 2015-04-22  5959   15286
#>  9     2 2015-05-04  1819    9327
#> 10     2 2015-07-25  7508    7508
#> 11     3 2014-11-11  6406    6406

^{Created on 2021-07-19 by the reprex package (v2.0.0)}

Ian Campbell · Answer

Here's an approach with purrr::map:

library(dplyr); library(purrr)
df2 %>%
  mutate(admitdt = as.Date(admitdt)) %>%
  group_by(ID) %>%
  mutate(cost365 = map_dbl(admitdt,~sum(cost[(.x - admitdt) <= 0 &
                                             (.x - admitdt) >= -365])))
# A tibble: 11 x 4
# Groups:   ID [3]
      ID admitdt     cost cost365
   <dbl> <date>     <dbl>   <dbl>
 1     1 2014-10-19  2000   21077
 2     1 2014-10-24 14077   19077
 3     1 2015-01-31  5000    5200
 4     1 2016-01-20   200     200
 5     1 2017-06-30   560    1060
 6     1 2017-07-17   500     500
 7     2 2015-04-21   888   16174
 8     2 2015-04-22  5959   15286
 9     2 2015-05-04  1819    9327
10     2 2015-07-25  7508    7508
11     3 2014-11-11  6406    6406

How to create a variable that is the sum of consecutive rows within a given time frame and by id

Tags:

r

group-by

sum

runner

rolling-computation

TexasMed

2 Answers

AnilGoyal

Ian Campbell

Recent Activity

Donate For Us

How to create a variable that is the sum of consecutive rows within a given time frame and by id

Tags:

r

group-by

sum

runner

rolling-computation

TexasMed

2 Answers

AnilGoyal

Ian Campbell

Related questions

Recent Activity

Donate For Us