I have recently come across an interesting question of calculating a vector values using its penultimate value as <code>.init</code> argument plus an additional vector's current value. Here is the sample data set: <pre class="prettyprint"><code>set.seed(13) dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100) dt$ind <- if_else(dt$time == 1, 120, if_else(dt$time == 2, 125, as.numeric(NA))) id time ret ind 1 a 1 0.005543269 120 2 a 2 -0.002802719 125 3 a 3 0.017751634 NA 4 a 4 0.001873201 NA 5 a 5 0.011425261 NA 6 b 1 0.004155261 120 7 b 2 0.012295066 125 8 b 3 0.002366797 NA 9 b 4 -0.003653828 NA 10 b 5 0.011051443 NA </code></pre> What I would like to calculate is: <blockquote> ind_{t} = ind_{t-2}*(1+ret_{t}) </blockquote> I tried the following code. Since <code>.init</code> is of no use here I tried the nullify the original <code>.init</code> and created a virtual <code>.init</code> but unfortunately it won't drag the newly created values (from third row downward) into calculation: <pre class="prettyprint"><code>dt %>% group_by(id) %>% mutate(ind = c(120, accumulate(3:n(), .init = 125, ~ .x * 1/.x * ind[.y - 2] * (1 + ret[.y])))) # A tibble: 10 x 4 # Groups: id [2] id time ret ind <chr> <int> <dbl> <dbl> 1 a 1 0.00554 120 2 a 2 -0.00280 125 3 a 3 0.0178 122. 4 a 4 0.00187 125. 5 a 5 0.0114 NA 6 b 1 0.00416 120 7 b 2 0.0123 125 8 b 3 0.00237 120. 9 b 4 -0.00365 125. 10 b 5 0.0111 NA </code></pre> I was wondering if there was a tweak I could make to this code and make it work completely. I would appreciate your help greatly in advance

I would have done it by creating dummy groups for each sequence, so that it can be done for any number of 'N'. Demonstrating it on a new elaborated data <pre class="prettyprint lang-r prettyprint-override"><code>df <- data.frame( stringsAsFactors = FALSE, grp = c("a","a","a","a", "a","a","a","a","a","b","b","b","b","b", "b","b","b","b"), rate = c(0.082322056, 0.098491104,0.07294593,0.08741672,0.030179747, 0.061389031,0.011232314,0.08553277,0.091272669, 0.031577847,0.024039791,0.091719552,0.032540636, 0.020411727,0.094521716,0.081729178,0.066429708, 0.04985793), ind = c(11000L,12000L, 13000L,NA,NA,NA,NA,NA,NA,10000L,13000L,12000L, NA,NA,NA,NA,NA,NA) ) df #> grp rate ind #> 1 a 0.08232206 11000 #> 2 a 0.09849110 12000 #> 3 a 0.07294593 13000 #> 4 a 0.08741672 NA #> 5 a 0.03017975 NA #> 6 a 0.06138903 NA #> 7 a 0.01123231 NA #> 8 a 0.08553277 NA #> 9 a 0.09127267 NA #> 10 b 0.03157785 10000 #> 11 b 0.02403979 13000 #> 12 b 0.09171955 12000 #> 13 b 0.03254064 NA #> 14 b 0.02041173 NA #> 15 b 0.09452172 NA #> 16 b 0.08172918 NA #> 17 b 0.06642971 NA #> 18 b 0.04985793 NA library(tidyverse) N = 3 df %>% group_by(grp) %>% group_by(d = row_number() %% N, .add = TRUE) %>% mutate(ind = accumulate(rate[-1] + 1, .init = ind[1], ~ .x * .y)) #> # A tibble: 18 x 4 #> # Groups: grp, d [6] #> grp rate ind d #> <chr> <dbl> <dbl> <dbl> #> 1 a 0.0823 11000 1 #> 2 a 0.0985 12000 2 #> 3 a 0.0729 13000 0 #> 4 a 0.0874 11962. 1 #> 5 a 0.0302 12362. 2 #> 6 a 0.0614 13798. 0 #> 7 a 0.0112 12096. 1 #> 8 a 0.0855 13420. 2 #> 9 a 0.0913 15057. 0 #> 10 b 0.0316 10000 1 #> 11 b 0.0240 13000 2 #> 12 b 0.0917 12000 0 #> 13 b 0.0325 10325. 1 #> 14 b 0.0204 13265. 2 #> 15 b 0.0945 13134. 0 #> 16 b 0.0817 11169. 1 #> 17 b 0.0664 14147. 2 #> 18 b 0.0499 13789. 0 </code></pre> <hr> Alternate answer in <code>dplyr</code> (using your own data modified a bit only) <pre class="prettyprint lang-r prettyprint-override"><code>set.seed(13) dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100) dt$ind <- ifelse(dt$time == 1, 12000, ifelse(dt$time == 2, 12500, as.numeric(NA))) library(dplyr, warn.conflicts = F) dt %>% group_by(id) %>% group_by(d= row_number() %% 2, .add = TRUE) %>% mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1]) #> # A tibble: 10 x 5 #> # Groups: id, d [4] #> id time ret ind d #> <chr> <int> <dbl> <dbl> <dbl> #> 1 a 1 0.00554 12000 1 #> 2 a 2 -0.00280 12500 0 #> 3 a 3 0.0178 12213. 1 #> 4 a 4 0.00187 12523. 0 #> 5 a 5 0.0114 12353. 1 #> 6 b 1 0.00416 12000 0 #> 7 b 2 0.0123 12500 1 #> 8 b 3 0.00237 12028. 0 #> 9 b 4 -0.00365 12454. 1 #> 10 b 5 0.0111 12161. 0 </code></pre>

Using accumulate function with second to last value as .init argument

Tags:

r

dplyr

purrr

rolling-computation

accumulate

I have recently come across an interesting question of calculating a vector values using its penultimate value as .init argument plus an additional vector's current value. Here is the sample data set:

set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- if_else(dt$time == 1, 120, if_else(dt$time == 2, 125, as.numeric(NA)))

   id time          ret ind
1   a    1  0.005543269 120
2   a    2 -0.002802719 125
3   a    3  0.017751634  NA
4   a    4  0.001873201  NA
5   a    5  0.011425261  NA
6   b    1  0.004155261 120
7   b    2  0.012295066 125
8   b    3  0.002366797  NA
9   b    4 -0.003653828  NA
10  b    5  0.011051443  NA

What I would like to calculate is:

ind_{t} = ind_{t-2}*(1+ret_{t})

I tried the following code. Since .init is of no use here I tried the nullify the original .init and created a virtual .init but unfortunately it won't drag the newly created values (from third row downward) into calculation:

dt %>%
  group_by(id) %>%
  mutate(ind = c(120, accumulate(3:n(), .init = 125, 
                                 ~ .x * 1/.x * ind[.y - 2] * (1 + ret[.y]))))

# A tibble: 10 x 4
# Groups:   id [2]
   id     time      ret   ind
   <chr> <int>    <dbl> <dbl>
 1 a         1  0.00554  120 
 2 a         2 -0.00280  125 
 3 a         3  0.0178   122.
 4 a         4  0.00187  125.
 5 a         5  0.0114    NA 
 6 b         1  0.00416  120 
 7 b         2  0.0123   125 
 8 b         3  0.00237  120.
 9 b         4 -0.00365  125.
10 b         5  0.0111    NA

I was wondering if there was a tweak I could make to this code and make it work completely. I would appreciate your help greatly in advance

759

asked Jul 28 '21 15:07

Anoushiravan R

Video Answer

2 Answers

Use a state vector consisting of the current value of ind and the prior value of ind. That way the prior state contains the second prior value of ind. We encode that into complex values with the real part equal to ind and the imaginary part equal to the prior value of ind. At the end we take the real part.

library(dplyr)
library(purrr)

dt %>%
  group_by(id) %>%
  mutate(result = c(ind[1],
                    Re(accumulate(.x = tail(ret, -2), 
                                  .f = ~ Im(.x) * (1 + .y) + Re(.x) * 1i,
                                  .init = ind[2] + ind[1] * 1i)))) %>%
  ungroup

giving:

# A tibble: 10 x 5
   id     time      ret   ind result
   <chr> <int>    <dbl> <dbl>  <dbl>
 1 a         1  0.00554   120   120 
 2 a         2 -0.00280   125   125 
 3 a         3  0.0178     NA   122.
 4 a         4  0.00187    NA   125.
 5 a         5  0.0114     NA   124.
 6 b         1  0.00416   120   120 
 7 b         2  0.0123    125   125 
 8 b         3  0.00237    NA   120.
 9 b         4 -0.00365    NA   125.
10 b         5  0.0111     NA   122.

Variation

This variation eliminates the complex numbers and uses a vector of 2 elements in place of each complex number with the first number corresponding to the real part in the prior solution and the second number of each pair corresponding to the imaginary part. This could be extended to cases where we need more than 2 numbers per state and where the dependence involves all of the last N values but for the question here there is the downside of the extra line of code to extract the result from the list of pairs of numbers which is more involved than using Re in the prior solution.

dt %>%
  group_by(id) %>%
  mutate(result = c(ind[1],
                    accumulate(.x = tail(ret, -2), 
                               .f = ~ c(.x[2] * (1 + .y), .x[1]),
                               .init = ind[2:1])),
         result = map_dbl(result, first)) %>%
  ungroup

Check

We check that the results above are correct. Alternately this could be used as a straight forward solution.

calc <- function(ind, ret) {
  for(i in seq(3, length(ret))) ind[i] <- ind[i-2] * (1 + ret[i])
  ind
}

dt %>%
  group_by(id) %>%
  mutate(result = calc(ind, ret)) %>%
  ungroup

giving:

# A tibble: 10 x 5
   id     time      ret   ind result
   <chr> <int>    <dbl> <dbl>  <dbl>
 1 a         1  0.00554   120   120 
 2 a         2 -0.00280   125   125 
 3 a         3  0.0178     NA   122.
 4 a         4  0.00187    NA   125.
 5 a         5  0.0114     NA   124.
 6 b         1  0.00416   120   120 
 7 b         2  0.0123    125   125 
 8 b         3  0.00237    NA   120.
 9 b         4 -0.00365    NA   125.
10 b         5  0.0111     NA   122.

127

answered Oct 24 '22 09:10

G. Grothendieck

I would have done it by creating dummy groups for each sequence, so that it can be done for any number of 'N'. Demonstrating it on a new elaborated data

df <- data.frame(
  stringsAsFactors = FALSE,
                     grp = c("a","a","a","a",
                             "a","a","a","a","a","b","b","b","b","b",
                             "b","b","b","b"),
                    rate = c(0.082322056,
                             0.098491104,0.07294593,0.08741672,0.030179747,
                             0.061389031,0.011232314,0.08553277,0.091272669,
                             0.031577847,0.024039791,0.091719552,0.032540636,
                             0.020411727,0.094521716,0.081729178,0.066429708,
                             0.04985793),
                     ind = c(11000L,12000L,
                             13000L,NA,NA,NA,NA,NA,NA,10000L,13000L,12000L,
                             NA,NA,NA,NA,NA,NA)
      )
df
#>    grp       rate   ind
#> 1    a 0.08232206 11000
#> 2    a 0.09849110 12000
#> 3    a 0.07294593 13000
#> 4    a 0.08741672    NA
#> 5    a 0.03017975    NA
#> 6    a 0.06138903    NA
#> 7    a 0.01123231    NA
#> 8    a 0.08553277    NA
#> 9    a 0.09127267    NA
#> 10   b 0.03157785 10000
#> 11   b 0.02403979 13000
#> 12   b 0.09171955 12000
#> 13   b 0.03254064    NA
#> 14   b 0.02041173    NA
#> 15   b 0.09452172    NA
#> 16   b 0.08172918    NA
#> 17   b 0.06642971    NA
#> 18   b 0.04985793    NA

library(tidyverse)
N = 3

df %>% group_by(grp) %>%
  group_by(d = row_number() %% N, .add = TRUE) %>%
  mutate(ind = accumulate(rate[-1] + 1, .init = ind[1], ~ .x * .y))
#> # A tibble: 18 x 4
#> # Groups:   grp, d [6]
#>    grp     rate    ind     d
#>    <chr>  <dbl>  <dbl> <dbl>
#>  1 a     0.0823 11000      1
#>  2 a     0.0985 12000      2
#>  3 a     0.0729 13000      0
#>  4 a     0.0874 11962.     1
#>  5 a     0.0302 12362.     2
#>  6 a     0.0614 13798.     0
#>  7 a     0.0112 12096.     1
#>  8 a     0.0855 13420.     2
#>  9 a     0.0913 15057.     0
#> 10 b     0.0316 10000      1
#> 11 b     0.0240 13000      2
#> 12 b     0.0917 12000      0
#> 13 b     0.0325 10325.     1
#> 14 b     0.0204 13265.     2
#> 15 b     0.0945 13134.     0
#> 16 b     0.0817 11169.     1
#> 17 b     0.0664 14147.     2
#> 18 b     0.0499 13789.     0

Alternate answer in dplyr (using your own data modified a bit only)

set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- ifelse(dt$time == 1, 12000, ifelse(dt$time == 2, 12500, as.numeric(NA)))

library(dplyr, warn.conflicts = F)

dt %>% group_by(id) %>%
  group_by(d= row_number() %% 2, .add = TRUE) %>%
  mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
#> # A tibble: 10 x 5
#> # Groups:   id, d [4]
#>    id     time      ret    ind     d
#>    <chr> <int>    <dbl>  <dbl> <dbl>
#>  1 a         1  0.00554 12000      1
#>  2 a         2 -0.00280 12500      0
#>  3 a         3  0.0178  12213.     1
#>  4 a         4  0.00187 12523.     0
#>  5 a         5  0.0114  12353.     1
#>  6 b         1  0.00416 12000      0
#>  7 b         2  0.0123  12500      1
#>  8 b         3  0.00237 12028.     0
#>  9 b         4 -0.00365 12454.     1
#> 10 b         5  0.0111  12161.     0

answered Oct 24 '22 08:10

AnilGoyal

Related questions
                            
                                Query to Snowflake database isn't working because no active warehouse is selected
                            
                                Specify the dots argument when calling a tidyselect-using function without needing to specify the preceding arguments
                            
                                Conditionally determining value of column by looking at last group
                            
                                Button to view in full screen
                            
                                "Debug location is approximate because the source is not available" in R 4.0.0 + RStudio
                            
                                An analog to rnorm in python
                            
                                r summarize_if with multiple conditions
                            
                                R devtools unable to install - Ubuntu 20.04 - package or namespace load failed for ‘pkgload’
                            
                                Pause and resume caret training in R
                            
                                Numbered captions on customized and reactive figure in R markdown HTML file
                            
                                Standardize variables using dplyr [r]
                            
                                Fable: Extracting the p,d,q specification from an ARIMA model
                            
                                How to shade shapes
                            
                                Column to nested list separated by /
                            
                                How to select a specific tab in R Markdown?
                            
                                How to plot 'outside' of plotting area using ggplot in R?
                            
                                Passing files from a rocker container to a latex container within a gitlab-ci job
                            
                                How to use ifelse inside map function in R
                            
                                Parallel processing in R - setting seed with mclapply() vs. pbmclapply()
                            
                                Adding labels to plotly map created using plot_geo

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With