Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative product of (1-previous_record)*current_record

The data frame contains two variables (time and rate) and 10 observations

time <- seq(1:10) 
rate <- 1-(0.99^time)
dat <- data.frame(time, rate)

I need to add a new column (called new_rate).

new_rate is defined as follows

Note: new_rate_1 is the first observation of new the column new_rate, etc.

new_rate_1 = rate_1
new_rate_2 = (1-rate_1)*rate_2
new_rate_3 = (1-rate_1)*(1-rate_2)*rate_3
new_rate_4 = (1-rate_1)*(1-rate_2)*(1-rate_3)*rate_4
...
new_rate_10 = (1-rate_1)*(1-rate_2)*(1-rate_3)*(1-rate_4)*(1-rate_5)*(1-rate_6)*(1-rate_7)*(1-rate_8)*(1-rate_9)*rate_10

How this can be done in base R or dplyr?

like image 461
user9292 Avatar asked Jul 23 '20 00:07

user9292


People also ask

How to find cumulative product?

B = cumprod( A ) returns the cumulative product of A starting at the beginning of the first array dimension in A whose size does not equal 1. If A is a vector, then cumprod(A) returns a vector containing the cumulative product of the elements of A .

What is cumulative products?

A cumulative product is a sequence of partial products of a given sequence.

What is cumprod?

cumprod() function is used when we want to compute the cumulative product of array elements over a given axis.

Which of the following function is used to get a cumulative product in Python?

cumprod() is used to find Cumulative product of a series.


Video Answer


2 Answers

cumprod to the rescue (hat-tip to @Cole for simplifying the code):

dat$rate * c(1, cumprod(1 - head(dat$rate, -1)))

The logic is that you are essentially doing a cumulative product of 1 - dat$rate, multiplied by the current step.
At the first step, you can just keep the existing value, but then you need to offset the two vectors so that the multiplication gives the desired result.

Proof:

out <- c(
dat$rate[1],
(1-dat$rate[1])*dat$rate[2],
(1-dat$rate[1])*(1-dat$rate[2])*dat$rate[3],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*dat$rate[4],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*dat$rate[5],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*dat$rate[6],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*dat$rate[7],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*dat$rate[8],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*dat$rate[9],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*(1-dat$rate[9])*dat$rate[10]
)

all.equal(
  dat$rate * c(1, cumprod(1 - head(dat$rate, -1))),
  out
)
#[1] TRUE
like image 134
thelatemail Avatar answered Sep 22 '22 16:09

thelatemail


A straightforward math approach using cumprod should work

> c(1, head(cumprod(1 - rate), -1)) * rate
 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
 [7] 0.05500858 0.05830607 0.06022773 0.06083074

If you want to practice with recursions, you can try the method below

f <- function(v, k = length(v)) {
    if (k == 1) {
        return(v[k])
    }
    u <- f(v, k - 1)
    c(u, tail(u, 1) * (1 / v[k - 1] - 1) * v[k])
}

such that

> f(rate)
 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
 [7] 0.05500858 0.05830607 0.06022773 0.06083074
like image 44
ThomasIsCoding Avatar answered Sep 22 '22 16:09

ThomasIsCoding