I'd like to generate cumulative sums with a reset if the "current" sum exceeds some threshold, using dplyr. In the below, I want to cumsum over 'a'.
library(dplyr)
library(tibble)
tib <- tibble(
  t = c(1,2,3,4,5,6),
  a = c(2,3,1,2,2,3)
)
# what I want
## thresh = 5
# A tibble: 6 x 4
#         t     a     g     c
#      <dbl> <dbl> <int> <dbl>
#   1  1.00  2.00     0  2.00
#   2  2.00  3.00     0  5.00
#   3  3.00  1.00     1  1.00
#   4  4.00  2.00     1  3.00
#   5  5.00  2.00     1  5.00
#   6  6.00  3.00     2  3.00
# what I want
## thresh = 4
# A tibble: 6 x 4
#         t     a     g     c
#      <dbl> <dbl> <int> <dbl>
#   1  1.00  2.00     0  2.00
#   2  2.00  3.00     0  5.00
#   3  3.00  1.00     1  1.00
#   4  4.00  2.00     1  3.00
#   5  5.00  2.00     1  5.00
#   6  6.00  3.00     2  3.00
# what I want
## thresh = 6
# A tibble: 6 x 4
#         t     a     g     c
#      <dbl> <dbl> <int> <dbl>
#   1  1.00  2.00     0  2.00
#   2  2.00  3.00     0  5.00
#   3  3.00  1.00     0  6.00
#   4  4.00  2.00     1  2.00
#   5  5.00  2.00     1  4.00
#   6  6.00  3.00     1  7.00
I've examined many of the similar questions here (such as resetting cumsum if value goes to negative in r) and have gotten what I hoped was close, but no.
I've tried variants of
thresh <-5
tib %>%
  group_by(g = cumsum(lag(cumsum(a) >= thresh, default = FALSE))) %>%
  mutate(c = cumsum(a)) %>%
  ungroup()
which returns
# A tibble: 6 x 4
      t     a     g     c
  <dbl> <dbl> <int> <dbl>
1  1.00  2.00     0  2.00
2  2.00  3.00     0  5.00
3  3.00  1.00     1  1.00
4  4.00  2.00     2  2.00
5  5.00  2.00     3  2.00
6  6.00  3.00     4  3.00
You can see that the "group" is not getting reset after the first time.
I think you can use accumulate() here to help. And i've also made a wrapper function to use for different thresholds
sum_reset_at <- function(thresh) {
  function(x) {
    accumulate(x, ~if_else(.x>=thresh, .y, .x+.y))
  }  
}
tib %>% mutate(c = sum_reset_at(5)(a))
#       t     a     c
#   <dbl> <dbl> <dbl>
# 1     1     2     2
# 2     2     3     5
# 3     3     1     1
# 4     4     2     3
# 5     5     2     5
# 6     6     3     3
tib %>% mutate(c = sum_reset_at(4)(a))
#       t     a     c
#   <dbl> <dbl> <dbl>
# 1     1     2     2
# 2     2     3     5
# 3     3     1     1
# 4     4     2     3
# 5     5     2     5
# 6     6     3     3
tib %>% mutate(c = sum_reset_at(6)(a))
#       t     a     c
#   <dbl> <dbl> <dbl>
# 1     1     2     2
# 2     2     3     5
# 3     3     1     6
# 4     4     2     2
# 5     5     2     4
# 6     6     3     7
                        if you're interested in the group building based on cumsum < threshold
You can use the following base:: function:
cumSumReset <- function(x, thresh = 4) {
    ans    <- numeric()
    i      <- 0
    while(length(x) > 0) {
        cs_over <- cumsum(x)
        ntimes <- sum( cs_over <= thresh )
        x      <- x[-(1:ntimes)]
        ans <- c(ans, rep(i, ntimes))
        i   <- i + 1
    }
    return(ans)
}
call:
tib %>% mutate(g = cumSumReset(a, 5))
result:
#   A tibble: 6 x 3
#      t     a     g
#  <dbl> <dbl> <dbl>
#1     1     2     0
#2     2     3     0
#3     3     1     1
#4     4     2     1
#5     5     2     1
#6     6     3     2
g you can now do whatever you like.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With