I have to speed up my script. I have some cycles like: <pre class="prettyprint"><code>DT <- data.frame(Index=1:20, A=c(10:29)) cost1 <- 3 cost2 <- 0.05 cost3 <- 50 DT$S[1] <- cost1 for (j in 2:(20)) { DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12 } </code></pre> Where cost1 and cost2 are constants. Is it possible to avoid writing a cycle?

The main problem with your approach is that you are repeatedly calling elements of data.frame (<code>DT$S</code>), but that is not needed in this calculations. If we replace that with vector and add the results to data.frame at the end, it is much faster. Also we can simplify the formula. <pre class="prettyprint"><code>n <- 1e4 DT <- data.frame(Index = 1:n, A = seq(10, by = 1, length.out = n)) cost1 <- 3 cost2 <- 0.05 cost3 <- 50 your <- function() { DT$S[1] <- cost1 for (j in 2:(n)) { DT$S[j] <- DT$S[j - 1] - cost3 + DT$S[j - 1]*cost2/12 } } your() </code></pre> My function: <pre class="prettyprint"><code>my <- function() { cc <- (1 + cost2/12) r <- vector('numeric', length = n) r[1] <- cost1 for (j in 2:(n)) { # r[j] <- r[j - 1] - cost3 + r[j - 1] * cost2/12 r[j] <- r[j - 1] * cc - cost3 } r } DT$S2 <- my() all.equal(DT$S, DT$S2) # [1] TRUE microbenchmark::microbenchmark(your(), my(), times = 2) # Unit: milliseconds # expr min lq mean median uq max neval cld # your() 487.229621 487.229621 490.86917 490.86917 494.508715 494.508715 2 b # my() 1.515178 1.515178 1.59408 1.59408 1.672982 1.672982 2 a </code></pre>

Your column <code>S</code> is defined by a first-order linear recurrence. The i-th term can be expressed in function of <code>i</code>, see e.g. these slides. <pre class="prettyprint"><code>> DT <- data.frame(Index=1:20) > cost1 <- 3; cost2 <- 0.05; cost3 <- 50 > DT$S[1] <- cost1 > for (j in 2:(20)) { + DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12 + } > DT$S [1] 3.00000 -46.98750 -97.18328 -147.58821 -198.20316 -249.02901 -300.06663 -351.31691 [9] -402.78073 -454.45898 -506.35256 -558.46236 -610.78929 -663.33424 -716.09814 -769.08188 [17] -822.28639 -875.71258 -929.36138 -983.23372 > s <- 1+cost2/12 > s_powers <- s^(1:(N-1)) > cost1*s_powers - cost3*(1-s_powers)/(1-s) [1] -46.98750 -97.18328 -147.58821 -198.20316 -249.02901 -300.06663 -351.31691 -402.78073 [9] -454.45898 -506.35256 -558.46236 -610.78929 -663.33424 -716.09814 -769.08188 -822.28639 [17] -875.71258 -929.36138 -983.23372 </code></pre> Let's compare four ways. <pre class="prettyprint"><code>f1 <- function(){ # your way DT$S[1] <- cost1 for (j in 2:N) { DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12 } } f2 <- function(){ # group the two DT$S[j-1] (cause DT$S[j-1] is slow) DT$S[1] <- cost1 for (j in 2:N) { DT$S[j] <- (1+cost2/12)*DT$S[j-1]-cost3 } } f3 <- function(){ # avoid DT$S[j-1] (@minem's answer) u <- numeric(N) u[1] <- cost1 for (j in 2:N) { u[j] <- (1+cost2/12)*u[j-1]-cost3 } DT$S <- u } f4 <- function(){ # express DT$S[j] in function of j s <- 1+cost2/12 s_powers <- s^(1:(N-1)) u2N <- cost1*s_powers - cost3*(1-s_powers)/(1-s) DT$S <- c(cost1, u2N) } </code></pre> Let's compare: <pre class="prettyprint"><code>> library(microbenchmark) > N <- 2000 > DT <- data.frame(Index=1:N) > microbenchmark( + f1 = f1(), + f2 = f2(), + f3 = f3(), + f4 = f4() + ) Unit: microseconds expr min lq mean median uq max neval cld f1 65802.386 67920.918 73168.4472 69025.145 70347.8050 180938.153 100 c f2 52641.373 54790.698 58553.8418 55916.565 57021.0145 163660.112 100 b f3 375.736 396.932 458.5317 418.798 459.6295 974.593 100 a f4 220.890 235.170 266.3977 240.971 259.9360 1318.199 100 a </code></pre> The winner is <code>f4</code>, the one which does not use recurrence.

Speed up iterative loop calculation with R

Tags:

performance

for-loop

r

I have to speed up my script. I have some cycles like:

Click to copy

DT <- data.frame(Index=1:20, A=c(10:29))

cost1 <- 3
cost2 <- 0.05
cost3 <- 50

DT$S[1] <- cost1
for (j in 2:(20)) {
  DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12
}

Where cost1 and cost2 are constants. Is it possible to avoid writing a cycle?

841

asked Jun 27 '18 07:06

stefanodv

2 Answers

The main problem with your approach is that you are repeatedly calling elements of data.frame (DT$S), but that is not needed in this calculations. If we replace that with vector and add the results to data.frame at the end, it is much faster. Also we can simplify the formula.

Click to copy

n <- 1e4
DT <- data.frame(Index = 1:n, A = seq(10, by = 1, length.out = n))

cost1 <- 3
cost2 <- 0.05
cost3 <- 50

your <- function() {
  DT$S[1] <- cost1
  for (j in 2:(n)) {
    DT$S[j] <- DT$S[j - 1] - cost3 + DT$S[j - 1]*cost2/12
  }
}
your()

My function:

Click to copy

my <- function() {    
  cc <- (1 + cost2/12)      
  r <- vector('numeric', length = n)
  r[1] <- cost1
  for (j in 2:(n)) {
    # r[j] <- r[j - 1] - cost3 + r[j - 1] * cost2/12
    r[j] <-  r[j - 1] * cc - cost3
  }
  r
}

DT$S2 <- my()
all.equal(DT$S, DT$S2)
# [1] TRUE

microbenchmark::microbenchmark(your(), my(), times = 2)
# Unit: milliseconds
#   expr        min         lq      mean    median         uq        max neval cld
# your() 487.229621 487.229621 490.86917 490.86917 494.508715 494.508715     2   b
#   my()   1.515178   1.515178   1.59408   1.59408   1.672982   1.672982     2  a

answered Sep 25 '22 05:09

minem

Your column S is defined by a first-order linear recurrence. The i-th term can be expressed in function of i, see e.g. these slides.

Click to copy

> DT <- data.frame(Index=1:20)
> cost1 <- 3; cost2 <- 0.05; cost3 <- 50
> DT$S[1] <- cost1
> for (j in 2:(20)) {
+   DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12
+ }
> DT$S
 [1]    3.00000  -46.98750  -97.18328 -147.58821 -198.20316 -249.02901 -300.06663 -351.31691
 [9] -402.78073 -454.45898 -506.35256 -558.46236 -610.78929 -663.33424 -716.09814 -769.08188
[17] -822.28639 -875.71258 -929.36138 -983.23372
> s <- 1+cost2/12
> s_powers <- s^(1:(N-1))
> cost1*s_powers - cost3*(1-s_powers)/(1-s)
 [1]  -46.98750  -97.18328 -147.58821 -198.20316 -249.02901 -300.06663 -351.31691 -402.78073
 [9] -454.45898 -506.35256 -558.46236 -610.78929 -663.33424 -716.09814 -769.08188 -822.28639
[17] -875.71258 -929.36138 -983.23372

Let's compare four ways.

Click to copy

f1 <- function(){ # your way
  DT$S[1] <- cost1
  for (j in 2:N) {
    DT$S[j] <- DT$S[j-1]-cost3+DT$S[j-1]*cost2/12
  }
}
f2 <- function(){ # group the two DT$S[j-1] (cause DT$S[j-1] is slow)
  DT$S[1] <- cost1
  for (j in 2:N) {
    DT$S[j] <- (1+cost2/12)*DT$S[j-1]-cost3
  }
}
f3 <- function(){ # avoid DT$S[j-1] (@minem's answer)
  u <- numeric(N)
  u[1] <- cost1
  for (j in 2:N) {
    u[j] <- (1+cost2/12)*u[j-1]-cost3
  }
  DT$S <- u
}
f4 <- function(){ # express DT$S[j] in function of j
  s <- 1+cost2/12
  s_powers <- s^(1:(N-1))
  u2N <- cost1*s_powers - cost3*(1-s_powers)/(1-s)
  DT$S <- c(cost1, u2N)
}

Let's compare:

Click to copy

> library(microbenchmark)
> N <- 2000
> DT <- data.frame(Index=1:N)
> microbenchmark(
+   f1 = f1(),
+   f2 = f2(),
+   f3 = f3(),
+   f4 = f4()
+ )
Unit: microseconds
 expr       min        lq       mean    median         uq        max neval cld
   f1 65802.386 67920.918 73168.4472 69025.145 70347.8050 180938.153   100   c
   f2 52641.373 54790.698 58553.8418 55916.565 57021.0145 163660.112   100  b 
   f3   375.736   396.932   458.5317   418.798   459.6295    974.593   100 a  
   f4   220.890   235.170   266.3977   240.971   259.9360   1318.199   100 a

The winner is f4, the one which does not use recurrence.

answered Sep 22 '22 05:09

Stéphane Laurent

Related questions
                            
                                Condition in ifelse: Value in multiple columns/variables
                            
                                Change the color of a ggplot geom a posteriori (after having specified another color)
                            
                                Extracting Information from Multi-Level Nested Lists
                            
                                Create 'dummy variables' by spreading duplicate rows into columns in R
                            
                                Using Likert Package in R for analyzing real survey data
                            
                                Two conditions for split a column
                            
                                How can I put multiple plots side-by-side in a tab panel with other outputs present, shiny r?
                            
                                Replace multiple values in a list in R
                            
                                Inner-Joining two sf objects by non sf column
                            
                                unable to set xlim and ylim using min() and max() in ggplot
                            
                                Retain list names after applying map
                            
                                From tibble to txt or excel file in R
                            
                                dplyr mutate a variable by comparing a variable and vectors of different sizes
                            
                                tidyr::expand() for a single column across groups
                            
                                accessing colors from a ggtheme theme in ggplot
                            
                                rlang: Get names from ... with colon shortcut in NSE function
                            
                                How to make a fuzzy join in R using more than one variable on each side
                            
                                Chloropleth map with geojson and ggplot2
                            
                                How to replace the certain character in certain position in the string?
                            
                                ggplotly - only return tooltip hover text on certain geom objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speed up iterative loop calculation with R

Tags:

performance

for-loop

r

stefanodv

People also ask

2 Answers

minem

Stéphane Laurent

Recent Activity

Donate For Us