The function below groups values in a vector based on whether the cumulative sum has reached a certain max value and then starts over.
cs_group <- function(x, threshold) {
cumsum <- 0
group <- 1
result <- numeric()
for (i in 1:length(x)) {
cumsum <- cumsum + x[i]
if (cumsum > threshold) {
group <- group + 1
cumsum <- x[i]
}
result = c(result, group)
}
return (result)
}
The max value in the example is 10. The first group only included 9; because summing it with the next value would result in a sum of 12. The next group includes 3, 2, 2 (+8 would result in a value higher then 10).
test <- c(9, 3, 2, 2, 8, 5, 4, 9, 1)
cs_group(test, 10)
[1] 1 2 2 2 3 4 4 5 5
However, I prefer to include in each group the value that results in the cumulative sum to be higher than the maximum value of 10.
Ideal result:
[1] 1 1 2 2 2 3 3 3 4
You can write your own custom function or use the code written by others.
I had the exact same problem few days back and this has been included in the MESS
package.
devtools::install_github("ekstroem/MESS")
MESS::cumsumbinning(test, 10, cutwhenpassed = TRUE)
#[1] 1 1 2 2 2 3 3 3 4
One purrr
approach could be:
cumsum(c(FALSE, diff(accumulate(test, ~ ifelse(.x >= 10, .y, .x + .y))) <= 0))
[1] 0 0 1 1 1 2 2 2 3
For your purpose, your cs_group
can be written like below (if I understand the logic behind in a correct way):
cs_group <- function(x, threshold) {
group <- 1
r <- c()
repeat {
if (length(x)==0) break
cnt <- (idx <- max(which(cumsum(x) <= threshold)))+ifelse(idx==length(x),0,1)
r <- c(r,rep(group, cnt))
x <- x[-(1:cnt)]
group <- group + 1
}
r
}
such that
test <- c(9, 3, 2, 2, 8, 5, 4, 9, 1)
> cs_group(test, 10)
[1] 1 1 2 2 2 3 3 3 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With