Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative sum that resets when 0 is encountered

Tags:

r

I would like to do a cumulative sum on a field but reset the aggregated value whenever a 0 is encountered.

Here is an example of what I want :

data.frame(campaign = letters[1:4] , 
       date=c("jan","feb","march","april"),
       b = c(1,0,1,1) ,
       whatiwant = c(1,0,1,2)
       )

 campaign  date b whatiwant
1        a   jan 1         1
2        b   feb 0         0
3        c march 1         1
4        d april 1         2
like image 561
patpat Avatar asked Sep 10 '15 12:09

patpat


People also ask

What is cumulative sum example?

The real example of a cumulative sum is the increasing amount of water in a swing pool. Example: Input: 10, 15, 20, 25, 30. Output: 10, 25, 45, 70, 100.

Why do we calculate cumulative sum?

Cumulative sums, or running totals, are used to display the total sum of data as it grows with time (or any other series or progression). This lets you view the total contribution so far of a given measure against time.

What is cumulative sum Matrix?

If A is a vector, then cumsum(A) returns a vector containing the cumulative sum of the elements of A . If A is a matrix, then cumsum(A) returns a matrix containing the cumulative sums for each column of A . If A is a multidimensional array, then cumsum(A) acts along the first nonsingleton dimension.

How do you plot a cumulative sum in R?

To create a cumulative sum plot in base R, we can simply use plot function. For cumulative sums inside the plot, the cumsum function needs to be used for the variable that has to be summed up with cumulation.


2 Answers

Another late idea:

ff = function(x)
{
    cs = cumsum(x)
    cs - cummax((x == 0) * cs)
}
ff(c(0, 1, 3, 0, 0, 5, 2))
#[1] 0 1 4 0 0 5 7

And to compare:

library(data.table)
ffdt = function(x) 
    data.table(x)[, whatiwant := cumsum(x), by = rleid(x == 0L)]$whatiwant

x = as.numeric(x) ##because 'cumsum' causes integer overflow
identical(ff(x), ffdt(x))
#[1] TRUE
microbenchmark::microbenchmark(ff(x), ffdt(x), times = 25)
#Unit: milliseconds
#    expr      min       lq   median       uq      max neval
#   ff(x) 315.8010 362.1089 372.1273 386.3892 405.5218    25
# ffdt(x) 374.6315 407.2754 417.6675 447.8305 534.8153    25
like image 149
alexis_laz Avatar answered Sep 28 '22 02:09

alexis_laz


Another base would be just

with(df, ave(b, cumsum(b == 0), FUN = cumsum))
## [1] 1 0 1 2

This will just divide column b to groups according to 0 appearances and compute the cumulative sum of b per these groups


Another solution using the latest data.table version (v 1.9.6+)

library(data.table) ## v 1.9.6+
setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)]
#    campaign  date b whatiwant
# 1:        a   jan 1         1
# 2:        b   feb 0         0
# 3:        c march 1         1
# 4:        d april 1         2

Some benchmarks per comments

set.seed(123)
x <- sample(0:1e3, 1e7, replace = TRUE)
system.time(res1 <- ave(x, cumsum(x == 0), FUN = cumsum))
# user  system elapsed 
# 1.54    0.24    1.81 
system.time(res2 <- Reduce(function(x, y) if (y == 0) 0 else x+y, x, accumulate=TRUE))
# user  system elapsed 
# 33.94    0.39   34.85 
library(data.table)
system.time(res3 <- data.table(x)[, whatiwant := cumsum(x), by = rleid(x == 0L)])
# user  system elapsed 
# 0.20    0.00    0.21 

identical(res1, as.integer(res2))
## [1] TRUE
identical(res1, res3$whatiwant)
## [1] TRUE
like image 36
David Arenburg Avatar answered Sep 28 '22 02:09

David Arenburg