Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional cumsum with reset

I have a data frame, the data frame is already sorted as needed, but now I will like to "slice it" in groups.

This groups should have a max cumulative value of 10. When the cumulative value is > 10, it should reset the cumulative sum and start over again

library(dplyr)
id <- sample(1:15)
order <- 1:15
value  <- c(4, 5, 7, 3, 8, 1, 2, 5, 3, 6, 2, 6, 3, 1, 4)
df  <- data.frame(id, order, value)
df

This is the output I'm looking for(I did it "manually")

cumsum_10  <- c(4, 9, 7, 10, 8, 9, 2, 7, 10, 6, 8, 6, 9, 10, 4)
group_10 <- c(1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7)
df1  <- data.frame(df, cumsum_10, group_10)
df1

So I'm having 2 problems

  1. How to create a cumulative variable that resets everytime it passes an upper limit (10 in this case)
  2. How to count/group every group

For the first part I was trying some combinations of group_by and cumsum with no luck

df1 <- df %>% group_by(cumsum(c(False, value < 10)))

I would prefer a pipe (%>%) solution instead of a for loop

Thanks

like image 265
Iair Kleiman Avatar asked Dec 30 '15 14:12

Iair Kleiman


3 Answers

I think this is not easily vectorizable.... at least i do not know how.

You can do it by hand via:

my_cumsum <- function(x){
  grp = integer(length(x))
  grp[1] = 1
  for(i in 2:length(x)){
    if(x[i-1] + x[i] <= 10){
      grp[i] = grp[i-1]
      x[i] = x[i-1] + x[i]
    } else {
      grp[i] = grp[i-1] + 1
    }
  }
  data.frame(grp, x)
}

For your data this gives:

> my_cumsum(df$value)
   grp  x
1    1  4
2    1  9
3    2  7
4    2 10
5    3  8
6    3  9
7    4  2
8    4  7
9    4 10
10   5  6
11   5  8
12   6  6
13   6  9
14   6 10
15   7  4

Also for my "counter-example" this gives:

> my_cumsum(c(10,6,4))
  grp  x
1   1 10
2   2  6
3   2 10

As @Khashaa pointed out this can be implementet more efficiently via Rcpp. He linked to this answer How to speed up or vectorize a for loop? which i find very useful

like image 104
Rentrop Avatar answered Nov 16 '22 13:11

Rentrop


You could define your own function and then use it inside dplyr's mutate statement as follows:

df %>% group_by() %>%
  mutate(
    cumsum_10 = cumsum_with_reset(value, 10),
    group_10 = cumsum_with_reset_group(value, 10)
  ) %>% 
  ungroup()

The cumsum_with_reset() function takes a column and a threshold value which resets the sum. cumsum_with_reset_group() is similar but identifies rows that have been grouped together. Definitions are as follows:

# group rows based on cumsum with reset
cumsum_with_reset_group <- function(x, threshold) {
  cumsum <- 0
  group <- 1
  result <- numeric()

  for (i in 1:length(x)) {
    cumsum <- cumsum + x[i]

    if (cumsum > threshold) {
      group <- group + 1
      cumsum <- x[i]
    }

    result = c(result, group)

  }

  return (result)
}

# cumsum with reset
cumsum_with_reset <- function(x, threshold) {
  cumsum <- 0
  group <- 1
  result <- numeric()

  for (i in 1:length(x)) {
    cumsum <- cumsum + x[i]

    if (cumsum > threshold) {
      group <- group + 1
      cumsum <- x[i]
    }

    result = c(result, cumsum)

  }

  return (result)
}

# use functions above as window functions inside mutate statement
df %>% group_by() %>%
  mutate(
    cumsum_10 = cumsum_with_reset(value, 10),
    group_10 = cumsum_with_reset_group(value, 10)
  ) %>% 
  ungroup()
like image 30
Josh Gilfillan Avatar answered Nov 16 '22 13:11

Josh Gilfillan


This can be done easily with purrr::accumulate

library(dplyr)
library(purrr)

df %>% mutate(cumsum_10 = accumulate(value, ~ifelse(.x + .y <= 10, .x + .y, .y)),
              group_10 = cumsum(value == cumsum_10))

   id order value cumsum_10 group_10
1   8     1     4         4        1
2  13     2     5         9        1
3   7     3     7         7        2
4   1     4     3        10        2
5   4     5     8         8        3
6  10     6     1         9        3
7  12     7     2         2        4
8   2     8     5         7        4
9  15     9     3        10        4
10 11    10     6         6        5
11 14    11     2         8        5
12  3    12     6         6        6
13  5    13     3         9        6
14  9    14     1        10        6
15  6    15     4         4        7
like image 5
AnilGoyal Avatar answered Nov 16 '22 15:11

AnilGoyal