Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional cumulative sum from two columns

I can't get my head around the following problem.

Assuming the follwoing data:

library(tidyverse)
df <- tibble(source = c("A", "A", "B", "B", "B", "C"),
             value  = c(5, 10, NA, NA, NA, 20),
             add    = c(1, 1, 1, 2, 3, 4))

What I want to do is: for all cases where source == "B", I want to calculate the cumulative sum of the previous row's value and the current row's add. Of course, for the first "B" row, I need to provide a starting value for value. Note: in this case, it would be fine if we just take the value from the last "A" row.

So for row 3, the result would be 10 + 1 = 11.
For row 4, the result would be 11 + 2 = 13.
For row 5, the results would be 13 + 3 = 16.

I tried to use purrr::accumulate, but I failed in many different ways, e.g. I thought I can do:

df %>%
  mutate(test = accumulate(add, .init = 10, ~.x + .y))

But this leads to error:

Error: Problem with `mutate()` column `test`.
i `test = accumulate(add, .init = 10, ~.x + .y)`.
i `test` must be size 6 or 1, not 7.

Same if I use .init = value

And I also didn't manage to do the job only on group B (although this is probably no issue, I think I can probably performa on the full data frame and then just replace values for all non-B rows).

Expected output:

# A tibble: 6 x 4
  source value   add  test
  <chr>  <dbl> <dbl> <dbl>
1 A          5     1    NA
2 A         10     1    NA
3 B         NA     1    11
4 B         NA     2    13
5 B         NA     3    16
6 C         20     4    NA
like image 265
deschen Avatar asked Oct 29 '25 07:10

deschen


1 Answers

You were essentially in the right direction. Since you provide an .init value to accumulate, the resulting vector is of size n+1, with the first value being .init. You have to remove the first value to get a vector that fit to your column size.

Then, if you want NAs on the remaining values, here's a way to do it. Also, since the "starting row" is the third, .init has to be set to 8.

df %>%
  mutate(test = 
           ifelse(source == "B", accumulate(add, .init = 8, ~.x + .y)[-1], NA))

# A tibble: 6 x 4
  source value   add  test
  <chr>  <dbl> <dbl> <dbl>
1 A          5     1    NA
2 A         10     1    NA
3 B         NA     1    11
4 B         NA     2    13
5 B         NA     3    16
6 C         20     4    NA
like image 155
Maël Avatar answered Oct 30 '25 23:10

Maël