I have to speed up my script. I have some cycles like:
DT <- data.frame(Index=1:20, A=c(10:29))
cost1 <- 3
cost2 <- 0.05
cost3 <- 50
DT$S[1] <- cost1
for (j in 2:(20)) {
  DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12
}
Where cost1 and cost2 are constants. Is it possible to avoid writing a cycle?
Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don't use too many iterations (let's say not more than 100,000 iterations).
Indeed, R for loops are inefficient, especially if you use them wrong. Searching for why R loops are slow discovers that many users are wondering about this question. Below, I summarize my experience and online discussions regarding this issue by providing some trivial code examples.
The main problem with your approach is that you are repeatedly calling elements of data.frame (DT$S), but that is not needed in this calculations. If we replace that with vector and add the results to data.frame at the end, it is much faster. Also we can simplify the formula.
n <- 1e4
DT <- data.frame(Index = 1:n, A = seq(10, by = 1, length.out = n))
cost1 <- 3
cost2 <- 0.05
cost3 <- 50
your <- function() {
  DT$S[1] <- cost1
  for (j in 2:(n)) {
    DT$S[j] <- DT$S[j - 1] - cost3 + DT$S[j - 1]*cost2/12
  }
}
your()
My function:
my <- function() {    
  cc <- (1 + cost2/12)      
  r <- vector('numeric', length = n)
  r[1] <- cost1
  for (j in 2:(n)) {
    # r[j] <- r[j - 1] - cost3 + r[j - 1] * cost2/12
    r[j] <-  r[j - 1] * cc - cost3
  }
  r
}
DT$S2 <- my()
all.equal(DT$S, DT$S2)
# [1] TRUE
microbenchmark::microbenchmark(your(), my(), times = 2)
# Unit: milliseconds
#   expr        min         lq      mean    median         uq        max neval cld
# your() 487.229621 487.229621 490.86917 490.86917 494.508715 494.508715     2   b
#   my()   1.515178   1.515178   1.59408   1.59408   1.672982   1.672982     2  a 
                        Your column S is defined by a first-order linear recurrence. The i-th term can be expressed in function of i, see e.g. these slides.
> DT <- data.frame(Index=1:20)
> cost1 <- 3; cost2 <- 0.05; cost3 <- 50
> DT$S[1] <- cost1
> for (j in 2:(20)) {
+   DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12
+ }
> DT$S
 [1]    3.00000  -46.98750  -97.18328 -147.58821 -198.20316 -249.02901 -300.06663 -351.31691
 [9] -402.78073 -454.45898 -506.35256 -558.46236 -610.78929 -663.33424 -716.09814 -769.08188
[17] -822.28639 -875.71258 -929.36138 -983.23372
> s <- 1+cost2/12
> s_powers <- s^(1:(N-1))
> cost1*s_powers - cost3*(1-s_powers)/(1-s)
 [1]  -46.98750  -97.18328 -147.58821 -198.20316 -249.02901 -300.06663 -351.31691 -402.78073
 [9] -454.45898 -506.35256 -558.46236 -610.78929 -663.33424 -716.09814 -769.08188 -822.28639
[17] -875.71258 -929.36138 -983.23372
Let's compare four ways.
f1 <- function(){ # your way
  DT$S[1] <- cost1
  for (j in 2:N) {
    DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12
  }
}
f2 <- function(){ # group the two DT$S[j-1] (cause DT$S[j-1] is slow)
  DT$S[1] <- cost1
  for (j in 2:N) {
    DT$S[j] <- (1+cost2/12)*DT$S[j-1]-cost3
  }
}
f3 <- function(){ # avoid DT$S[j-1] (@minem's answer)
  u <- numeric(N)
  u[1] <- cost1
  for (j in 2:N) {
    u[j] <- (1+cost2/12)*u[j-1]-cost3
  }
  DT$S <- u
}
f4 <- function(){ # express DT$S[j] in function of j
  s <- 1+cost2/12
  s_powers <- s^(1:(N-1))
  u2N <- cost1*s_powers - cost3*(1-s_powers)/(1-s)
  DT$S <- c(cost1, u2N)
}
Let's compare:
> library(microbenchmark)
> N <- 2000
> DT <- data.frame(Index=1:N)
> microbenchmark(
+   f1 = f1(),
+   f2 = f2(),
+   f3 = f3(),
+   f4 = f4()
+ )
Unit: microseconds
 expr       min        lq       mean    median         uq        max neval cld
   f1 65802.386 67920.918 73168.4472 69025.145 70347.8050 180938.153   100   c
   f2 52641.373 54790.698 58553.8418 55916.565 57021.0145 163660.112   100  b 
   f3   375.736   396.932   458.5317   418.798   459.6295    974.593   100 a  
   f4   220.890   235.170   266.3977   240.971   259.9360   1318.199   100 a  
The winner is f4, the one which does not use recurrence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With