Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a variable that is the sum of consecutive rows within a given time frame and by id

I am trying to implement a sum of consecutive values that fall with 365 days of each other, grouped by a unique identifier in R. For example, for date 1 of a particular ID, we would add dates 2,3,4 (fall within 365 days) of the same ID to get a total cost for date 1. Then for date 2, we would add 3 and 4 to get that total cost and so on. I have tried several rolling sums (R dplyr rolling sum) and similar solutions from dplyr that take sum consecutive values (Calculate sum of a column if the difference between consecutive rows meets a condition) with certain constraints but could not get the code to distinguish the amount of days. I have included a sample dataset with a solution dataset for an example that I am looking for.

Starting Data Set

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,5000,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df2<-data.frame(ID,admitdt,cost,cost365)

  ID    admitdt  cost
1   1 2014-10-19  2000
2   1 2014-10-24 14077
3   1 2015-01-31  5000
4   1 2016-01-20   200
5   1 2017-06-30   560
6   1 2017-07-17  5000
7   2 2015-04-21   888
8   2 2015-04-22  5959
9   2 2015-05-04  1819
10  2 2015-07-25  7508
11  3 2014-11-11  6406

Solution:

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,500,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df2<-data.frame(ID,admitdt,cost,cost365)
  ID    admitdt  cost cost365
1   1 2014-10-19  2000   21077
2   1 2014-10-24 14077   19077
3   1 2015-01-31  5000    5200
4   1 2016-01-20   200     200
5   1 2017-06-30   560    5560
6   1 2017-07-17  5000    5000
7   2 2015-04-21   888   16174
8   2 2015-04-22  5959   15286
9   2 2015-05-04  1819    9327
10  2 2015-07-25  7508    7508
11  3 2014-11-11  6406    6406
like image 682
TexasMed Avatar asked Dec 06 '25 07:12

TexasMed


2 Answers

I am giving 2 methods each in slider and runner. Out of these I like slider because of its clarity of syntax. Neverthess, the strategy in both are nearly same,

  • date column will act as index in both.
  • slider gives more control becuase it has .before and .after agruments, which in the instant case you need only after = days(365) (that is integrated with lubridate)
  • in runner k is always backwards so I used -364 there.
  • Rest is clear. Still If any further clarification is needed, do ask.

In slider you can do

library(tidyverse)

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,5000,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df<-data.frame(ID,admitdt,cost)

df
#>    ID    admitdt  cost
#> 1   1 2014-10-19  2000
#> 2   1 2014-10-24 14077
#> 3   1 2015-01-31  5000
#> 4   1 2016-01-20   200
#> 5   1 2017-06-30   560
#> 6   1 2017-07-17  5000
#> 7   2 2015-04-21   888
#> 8   2 2015-04-22  5959
#> 9   2 2015-05-04  1819
#> 10  2 2015-07-25  7508
#> 11  3 2014-11-11  6406

library(slider)
library(lubridate)

df %>% group_by(ID) %>%
  mutate(admitdt = as.Date(admitdt),
              cost365 = slider::slide_index_sum(x = cost,
                                                i = admitdt,
                                                after = days(365)))
#> # A tibble: 11 x 4
#> # Groups:   ID [3]
#>       ID admitdt     cost cost365
#>    <dbl> <date>     <dbl>   <dbl>
#>  1     1 2014-10-19  2000   21077
#>  2     1 2014-10-24 14077   19077
#>  3     1 2015-01-31  5000    5200
#>  4     1 2016-01-20   200     200
#>  5     1 2017-06-30   560    5560
#>  6     1 2017-07-17  5000    5000
#>  7     2 2015-04-21   888   16174
#>  8     2 2015-04-22  5959   15286
#>  9     2 2015-05-04  1819    9327
#> 10     2 2015-07-25  7508    7508
#> 11     3 2014-11-11  6406    6406

Or in runner

library(dplyr, warn.conflicts = F)

ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,5000,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df<-data.frame(ID,admitdt,cost)

library(runner)

df %>% group_by(ID) %>%
  mutate(admitdt = as.Date(admitdt),
         cost365 = runner::sum_run(x = cost,
                                   idx = admitdt,
                                   k = 365,
                                   lag = -364))
#> # A tibble: 11 x 4
#> # Groups:   ID [3]
#>       ID admitdt     cost cost365
#>    <dbl> <date>     <dbl>   <dbl>
#>  1     1 2014-10-19  2000   21077
#>  2     1 2014-10-24 14077   19077
#>  3     1 2015-01-31  5000    5200
#>  4     1 2016-01-20   200     200
#>  5     1 2017-06-30   560    5560
#>  6     1 2017-07-17  5000    5000
#>  7     2 2015-04-21   888   16174
#>  8     2 2015-04-22  5959   15286
#>  9     2 2015-05-04  1819    9327
#> 10     2 2015-07-25  7508    7508
#> 11     3 2014-11-11  6406    6406

Created on 2021-07-19 by the reprex package (v2.0.0)

like image 57
AnilGoyal Avatar answered Dec 08 '25 20:12

AnilGoyal


Here's an approach with purrr::map:

library(dplyr); library(purrr)
df2 %>%
  mutate(admitdt = as.Date(admitdt)) %>%
  group_by(ID) %>%
  mutate(cost365 = map_dbl(admitdt,~sum(cost[(.x - admitdt) <= 0 &
                                             (.x - admitdt) >= -365])))
# A tibble: 11 x 4
# Groups:   ID [3]
      ID admitdt     cost cost365
   <dbl> <date>     <dbl>   <dbl>
 1     1 2014-10-19  2000   21077
 2     1 2014-10-24 14077   19077
 3     1 2015-01-31  5000    5200
 4     1 2016-01-20   200     200
 5     1 2017-06-30   560    1060
 6     1 2017-07-17   500     500
 7     2 2015-04-21   888   16174
 8     2 2015-04-22  5959   15286
 9     2 2015-05-04  1819    9327
10     2 2015-07-25  7508    7508
11     3 2014-11-11  6406    6406
like image 36
Ian Campbell Avatar answered Dec 08 '25 21:12

Ian Campbell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!