Fast way to calculate values in cells based on values in previous rows

Question

I wish to have a fast way to deal with rowwise calculations where values of cells depend on values in previous rows of different columns, prefering vectorization over looping through individual rows (follow-up from here).

Say I have the following dataset dt and a constant (loaded libraries are data.table, dplyr and purrr)

dt <- structure(list(var1 = c(-92186.7470607738, -19163.5035325072, 
-18178.8396858014, -9844.67882723287, -16494.7802822178, -17088.0576319257
), var2 = c(-3.12, NA, NA, NA, NA, NA), var3 = c(1, NA, NA, NA, 
NA, NA)), class = c("data.table", "data.frame"), row.names = c(NA, 
-6L))

constant <- 608383


print(dt)
         var1  var2 var3
1: -92186.747 -3.12    1
2: -19163.504    NA   NA
3: -18178.840    NA   NA
4:  -9844.679    NA   NA
5: -16494.780    NA   NA
6: -17088.058    NA   NA

The fast, vectorized equivalent of

for(i in 2:nrow(dt)){
  prev <- dt[(i-1),]
  dt[i, var2 := prev$var2 - var1/constant]
}

would be

dt %>%
  mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))

But what if I want to include more columns in the calculation? In this example var3, but in the real dataset there are >10 columns. I wish the solution to keep that into account. Example for loop (desired output):

for(i in 2:nrow(dt)){
  prev <- dt[(i-1),]
  dt[i, var2 := prev$var2 + prev$var3 - var1/constant]
  dt[i, var3 := prev$var1 + 0.1 * var2/constant]
}
print(dt)

         var1          var2      var3
1: -92186.747 -3.120000e+00      1.00
2: -19163.504 -2.088501e+00 -92186.75
3: -18178.840 -9.218881e+04 -19163.52
4:  -9844.679 -1.113523e+05 -18178.86
5: -16494.780 -1.295311e+05  -9844.70
6: -17088.058 -1.393758e+05 -16494.80

chinsoon12 · Accepted Answer

Another option using Rcpp:

library(Rcpp)
cppFunction('List func(NumericVector var1, double c, double v2, double v3) {
    int n = var1.size();
    NumericVector var2(n);
    NumericVector var3(n);
    
    var2[0] = v2;
    var3[0] = v3;
    for (int i = 1; i < n; i++) {
        var2[i] = var2[i-1] + var3[i-1] - var1[i]/c;
        var3[i] = var1[i-1] + 0.1 * var2[i]/c;
    }
    
    List ret;
    ret["var2"] = var2;
    ret["var3"] = var3;
    return ret;
}')
dt[, c("var2", "var3") := func(var1, constant, var2[1L], var3[1L])]

AnilGoyal · Answer

Though My friend's output/strategy is fabulous, but since we cannot have two input vectors in baseR's Reduce() so I used this trick-

Generated fresh values of var1 in data.frame() inside the Reduce()
Where you want to use current values of var1 use .y
where previous values were to be used use .x$var1 instead.
used formula where I'd require to use current generated value of any variable.
rest is pretty clear I think.
accumulate = TRUE is obvious becuase you want all intermediate values.
Since output here will be a list, that is rbind using do.call

In base R you can do

do.call(rbind, Reduce(function(.x, .y) {data.frame(var1 = .y,
                            var2 = .x$var2 + .x$var3 -.y/constant,
                            var3 = .x$var1 + 0.1 * (.x$var2 + .x$var3 -.y/constant)/constant)}, 
       dt$var1[-1], 
       init =  data.frame(var1 = dt$var1[1], var2 = -3.12, var3 = 1),
       accumulate = TRUE))

        var1          var2      var3
1 -92186.747 -3.120000e+00      1.00
2 -19163.504 -2.088501e+00 -92186.75
3 -18178.840 -9.218881e+04 -19163.52
4  -9844.679 -1.113523e+05 -18178.86
5 -16494.780 -1.295311e+05  -9844.70
6 -17088.058 -1.393758e+05 -16494.80

which can be emulated in tidyverse/purrr as follows

library(purrr)
accumulate(dt$var1[-1], .init = data.frame(var1 = dt$var1[1], var2 = -3.12, var3 = 1), 
           ~ data.frame(var1 = .y,
                        var2 = .x$var2 + .x$var3 -(.y/constant),
                        var3 = .x$var1 + 0.1 * (.x$var2 + .x$var3 -(.y/constant))/constant)) %>% map_df(~.x)

        var1          var2      var3
1 -92186.747 -3.120000e+00      1.00
2 -19163.504 -2.088501e+00 -92186.75
3 -18178.840 -9.218881e+04 -19163.52
4  -9844.679 -1.113523e+05 -18178.86
5 -16494.780 -1.295311e+05  -9844.70
6 -17088.058 -1.393758e+05 -16494.80

Fast way to calculate values in cells based on values in previous rows

Tags:

r

data.table

dplyr

purrr

rolling-computation

maarvd

2 Answers

chinsoon12

AnilGoyal

Recent Activity

Donate For Us

Fast way to calculate values in cells based on values in previous rows

Tags:

r

data.table

dplyr

purrr

rolling-computation

maarvd

2 Answers

chinsoon12

AnilGoyal

Related questions

Recent Activity

Donate For Us