Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group numeric vector by predefined maximal group sum

I have a numeric vector like this x <- c(1, 23, 7, 10, 9, 2, 4) and I want to group the elements from left to right with the constrain that each group sum must not exceed 25. Thus, here the first group is c(1, 23), the second is c(7, 10) and the last c(9, 2, 4). the expected output is a dataframe with a second column containing the groups:

data.frame(x= c(1, 23,  7,  10,  9,  2,  4), group= c(1, 1, 2, 2, 3, 3, 3))

I have tried different things with cumsum but am not able to kind of dynamically restart cumsum for the new group once the limit sum of 25 for the last group is reached.

like image 976
Igor stands with Ukraine Avatar asked May 08 '26 00:05

Igor stands with Ukraine


2 Answers

I think cpp function is the fastest way:

library(Rcpp)
cppFunction(
    "IntegerVector GroupBySum(const NumericVector& x, const double& max_sum = 25)
    {
        double sum = 0;
        int cnt = 0;
        int period = 1;
        IntegerVector res(x.size());
        for (int i = 0; i < x.size(); ++i)
        {
            ++cnt;
            sum += x[i];
            if (sum > max_sum)
            {
                sum = x[i];
                if (cnt > 1)
                    ++period;
                cnt = 1;
            }
            res[i] = period;
        }
        return res;
    }"
)
GroupBySum(c(1, 23,  7,  10,  9,  2,  4), 25)
like image 128
Егор Шишунов Avatar answered May 10 '26 18:05

Егор Шишунов


We can try this as a programming practice if you like :)

f1 <- function(x) {
  group <- c()
  while (length(x)) {
    idx <- cumsum(x) <= 25
    x <- x[!idx]
    group <- c(group, rep(max(group, 0) + 1, sum(idx)))
  }
  group
}

or

f2 <- function(x) {
  group <- c()
  g <- 0
  while (length(x)) {
    cnt <- s <- 0
    for (i in seq_along(x)) {
      s <- s + x[i]
      if (s <= 25) {
        cnt <- cnt + 1
      } else {
        break
      }
    }
    g <- g + 1
    group <- c(group, rep(g, cnt))
    x <- x[-(1:cnt)]
  }
  group
}

or

f3 <- function(x) {
  s <- cumsum(x)
  r <- c()
  grp <- 1
  while (length(s)) {
    idx <- (s <= 25)
    r <- c(r, rep(grp, sum(idx)))
    grp <- grp + 1
    s <- s[!idx] - tail(s[idx], 1)
  }
  r
}

which gives

[1] 1 1 2 2 3 3 3

and benchmarking among them looks like

set.seed(1)
set.seed(1)
x <- runif(1e3, 0, 25)
bm <- microbenchmark(
  f1(x),
  f2(x),
  f3(x),
  check = "equivalent"
)
autoplot(bm)

enter image description here


Recursion version

Another option is using recursion (based on f1())

f <- function(x, res = c()) {
  if (!length(x)) {
    return(res)
  }
  idx <- cumsum(x) <= 25
  Recall(x[!idx], res = c(res, list(x[idx])))
}

and you will see

> f(x)
[[1]]
[1]  1 23

[[2]]
[1]  7 10

[[3]]
[1] 9 2 4
like image 20
ThomasIsCoding Avatar answered May 10 '26 18:05

ThomasIsCoding



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!