Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate cumsum from the end towards the beginning

Tags:

r

reverse

cumsum

I'm trying to calculate the cumsum starting from the last row towards the first for each group.

Sample data:

t1 <- data.frame(var = "a", val = c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0))
t2 <- data.frame(var = "b", val = c(0,0,0,0,1,0,0,1,0,0,0,0,0,0,0))
ts <- rbind(t1, t2)

Desired format (grouped by var):

ts <- data.frame(var = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
                           "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), 
                 val = c(2,2,2,2,2,1,1,1,1,1,0,0,0,0,0,2,2,2,2,2,1,1,1,0,0,0,0,0,0,0))
like image 820
adl Avatar asked May 18 '18 14:05

adl


People also ask

What does Cumsum in R mean?

The cumulative sum can be defined as the sum of a set of numbers as the sum value grows with the sequence of numbers. cumsum() function in R Language is used to calculate the cumulative sum of the vector passed as argument. Syntax: cumsum(x)

How does cumulative sum work?

Cumulative sums, or running totals, are used to display the total sum of data as it grows with time (or any other series or progression). This lets you view the total contribution so far of a given measure against time.

How do you do cumulative sum in pandas?

The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.


2 Answers

Promoting my comment to an answer; using:

ts$val2 <- ave(ts$val, ts$var, FUN = function(x) rev(cumsum(rev(x))))

gives:

> ts
   var val val2
1    a   0    2
2    a   0    2
3    a   0    2
4    a   0    2
5    a   1    2
6    a   0    1
7    a   0    1
8    a   0    1
9    a   0    1
10   a   1    1
11   a   0    0
12   a   0    0
13   a   0    0
14   a   0    0
15   a   0    0
16   b   0    2
17   b   0    2
18   b   0    2
19   b   0    2
20   b   1    2
21   b   0    1
22   b   0    1
23   b   1    1
24   b   0    0
25   b   0    0
26   b   0    0
27   b   0    0
28   b   0    0
29   b   0    0
30   b   0    0

Or with dplyr or data.table:

library(dplyr)
ts %>% 
  group_by(var) %>%
  mutate(val2 = rev(cumsum(rev(val))))

library(data.table)
setDT(ts)[, val2 := rev(cumsum(rev(val))), by = var]
like image 138
Jaap Avatar answered Oct 19 '22 13:10

Jaap


An option without explicitly reversing the vector:

ave(ts$val, ts$var, FUN = function(x) Reduce(sum, x, right = TRUE, accumulate = TRUE))

 [1] 2 2 2 2 2 1 1 1 1 1 0 0 0 0 0 2 2 2 2 2 1 1 1 0 0 0 0 0 0 0

Or the same approach with dplyr:

ts %>%
 group_by(var) %>%
 mutate(val = Reduce(sum, val, right = TRUE, accumulate = TRUE))
like image 22
tmfmnk Avatar answered Oct 19 '22 13:10

tmfmnk