Recursive sum over two variables using dplyr

Question

I have two columns with values, a and b. I want to add a third column c, which is (at row i) the sum from 0 to i of b plus the sum from 0 to (i-1) of c, multiplied with a, i.e.

c_i = (sum_i (b) + sum_(i-1) (c) ) * a_i

I tried

data %>%
mutate(
 c = a * (cumsum(b) + lag(cumsum(c), default = 0))
)

However this doesn't work, as I am just creating c based on values of c that don't exist at the moment:

Error: Problem with `mutate()` input `c`.
x object 'c' not found

Previously I handled such problems using for-loops. However, I got used to dplyr, and there is always a way. However, I do not get it.

I am grateful for any help!

edit: In a previous version I was inaccurate, as a is also a vector, not a constant. I changed it in the formula

The desired output:

row 1: 0.5 * (7  + 0 ) =3.5

row 2: 0.3 * (7+1 + 3.5) = 3.45

row 3: 1.0 * (7+1+9 + 3.5+3.45) = 23.95

| a | b | c |
|---|---|---|
|0.5|7|3.5|
|0.3|1|3.45|
|1|9|23.95|
|0.2|10|...|

AnilGoyal · Accepted Answer

Perhaps I would have done it in similar fashion like @27phi9. You may, however, do this without writing any function before hand. I am giving you three approaches (i) baseR, (ii) dplyr only, (iii) dplyr + purrr

df <- structure(list(a = c(0.5, 0.3, 1, 0.2, 0.4, 0.8), b = c(7L, 1L,  9L, 10L, 3L, 2L)), row.names = c(NA, -6L), class = c("tbl_df",  "tbl", "data.frame"))

transform(df, C = {x <- 0; Reduce(function(.x, .y){x <<- .x + x; (cumsum(b)[[.y]] + x) * a[[.y]]}, 
                                  seq(nrow(df)), 
                                  init = 0, 
                                  accumulate = TRUE)[-1]})
#>     a  b       C
#> 1 0.5  7  3.5000
#> 2 0.3  1  3.4500
#> 3 1.0  9 23.9500
#> 4 0.2 10 11.5800
#> 5 0.4  3 28.9920
#> 6 0.8  2 82.7776

library(dplyr)

df %>%
  mutate(C = {x <- 0; Reduce(function(.x, .y){x <<- .x + x; (cumsum(b)[[.y]] + x) * a[[.y]]}, 
                             seq(nrow(df)), 
                             init = 0, 
                             accumulate = TRUE)[-1]})
#> # A tibble: 6 x 3
#>       a     b     C
#>   <dbl> <int> <dbl>
#> 1   0.5     7  3.5 
#> 2   0.3     1  3.45
#> 3   1       9 24.0 
#> 4   0.2    10 11.6 
#> 5   0.4     3 29.0 
#> 6   0.8     2 82.8

library(purrr)
df %>%
  mutate(C = {x <- 0; unlist(accumulate2(cumsum(b), a, .init = 0, ~ {x <<- ..1 + x; (..2 + x) * ..3 }))[-1]})
#> # A tibble: 6 x 3
#>       a     b     C
#>   <dbl> <int> <dbl>
#> 1   0.5     7  3.5 
#> 2   0.3     1  3.45
#> 3   1       9 24.0 
#> 4   0.2    10 11.6 
#> 5   0.4     3 29.0 
#> 6   0.8     2 82.8

ThomasIsCoding · Answer

Update

A super efficient option is by solving the a linear matrix (thank @Martin Gal for comments):

transform(
  df,
  C = solve(
    `diag<-`(mat <- matrix(-a, length(a), length(a)), 1) * lower.tri(mat, diag = TRUE),
    a * cumsum(b)
  )
)

which gives

    a  b       C
1 0.5  7  3.5000
2 0.3  1  3.4500
3 1.0  9 23.9500
4 0.2 10 11.5800
5 0.4  3 28.9920
6 0.8  2 82.7776

or in a dplyr manner

df %>%
  mutate(
    C = solve(
      `diag<-`(mat <- matrix(-a, length(a), length(a)), 1) * lower.tri(mat, diag = TRUE),
      a * cumsum(b)
    )
  )

which gives

# A tibble: 6 x 3
      a     b     C
  <dbl> <int> <dbl>
1   0.5     7  3.5
2   0.3     1  3.45
3   1       9 24.0
4   0.2    10 11.6
5   0.4     3 29.0
6   0.8     2 82.8

Previous answer (Recursion approach, INFFICIENT)

A base R option (but inefficient) by defining a recursion function f

f <- function(k) {
  if (k == 1) {
    return(with(df[k, ], a * b))
  }
  r <- f(k - 1)
  c(r, with(df, a[k] * (sum(b[1:k]) + sum(r))))
}

and you will see

> f(nrow(df))
[1]  3.5000  3.4500 23.9500 11.5800 28.9920 82.7776

and

> df %>%
+   mutate(C = f(n()))
# A tibble: 6 x 3
      a     b     C
  <dbl> <int> <dbl>
1   0.5     7  3.5
2   0.3     1  3.45
3   1       9 24.0
4   0.2    10 11.6
5   0.4     3 29.0
6   0.8     2 82.8

Recursive sum over two variables using dplyr

Tags:

performance

dataframe

r

recursion

dplyr

C. Sebastian

2 Answers

AnilGoyal

Update

Previous answer (Recursion approach, INFFICIENT)

ThomasIsCoding

Recent Activity

Donate For Us

Recursive sum over two variables using dplyr

Tags:

performance

dataframe

r

recursion

dplyr

C. Sebastian

2 Answers

AnilGoyal

Update

Previous answer (Recursion approach, INFFICIENT)

ThomasIsCoding

Related questions

Recent Activity

Donate For Us