The data frame contains two variables (time
and rate
) and 10 observations
time <- seq(1:10)
rate <- 1-(0.99^time)
dat <- data.frame(time, rate)
I need to add a new column (called new_rate
).
new_rate
is defined as follows
Note: new_rate_1
is the first observation of new the column new_rate, etc.
new_rate_1 = rate_1
new_rate_2 = (1-rate_1)*rate_2
new_rate_3 = (1-rate_1)*(1-rate_2)*rate_3
new_rate_4 = (1-rate_1)*(1-rate_2)*(1-rate_3)*rate_4
...
new_rate_10 = (1-rate_1)*(1-rate_2)*(1-rate_3)*(1-rate_4)*(1-rate_5)*(1-rate_6)*(1-rate_7)*(1-rate_8)*(1-rate_9)*rate_10
How this can be done in base R or dplyr
?
B = cumprod( A ) returns the cumulative product of A starting at the beginning of the first array dimension in A whose size does not equal 1. If A is a vector, then cumprod(A) returns a vector containing the cumulative product of the elements of A .
A cumulative product is a sequence of partial products of a given sequence.
cumprod() function is used when we want to compute the cumulative product of array elements over a given axis.
cumprod() is used to find Cumulative product of a series.
cumprod
to the rescue (hat-tip to @Cole for simplifying the code):
dat$rate * c(1, cumprod(1 - head(dat$rate, -1)))
The logic is that you are essentially doing a cum
ulative prod
uct of 1 - dat$rate
, multiplied by the current step.
At the first step, you can just keep the existing value, but then you need to offset the two vectors so that the multiplication gives the desired result.
Proof:
out <- c(
dat$rate[1],
(1-dat$rate[1])*dat$rate[2],
(1-dat$rate[1])*(1-dat$rate[2])*dat$rate[3],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*dat$rate[4],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*dat$rate[5],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*dat$rate[6],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*dat$rate[7],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*dat$rate[8],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*dat$rate[9],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*(1-dat$rate[9])*dat$rate[10]
)
all.equal(
dat$rate * c(1, cumprod(1 - head(dat$rate, -1))),
out
)
#[1] TRUE
A straightforward math approach using cumprod
should work
> c(1, head(cumprod(1 - rate), -1)) * rate
[1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
[7] 0.05500858 0.05830607 0.06022773 0.06083074
If you want to practice with recursions, you can try the method below
f <- function(v, k = length(v)) {
if (k == 1) {
return(v[k])
}
u <- f(v, k - 1)
c(u, tail(u, 1) * (1 / v[k - 1] - 1) * v[k])
}
such that
> f(rate)
[1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
[7] 0.05500858 0.05830607 0.06022773 0.06083074
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With