Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loops in R - Need to use index, anyway to avoid 'for'?

Tags:

loops

for-loop

r

I know it's not the best practice in R to use the for loop because it doesn't have an enhanced performance. For almost all cases there is a function of the family *apply that solves our problems.

However I'm facing a situation where I don't see a workaround.

I need to calculate percent variation for consecutive values:

pv[1] <- 0
for(i in 2:length(x)) {
  pv[i] <- (x[i] - x[i-1])/x[i-1]
}

So, as you can see, I have to use both the x[i] element, but also the x[i-1] element. By using the *apply functions, I just see how to use the x[i]. Is there anyway I can avoid the forloops?

like image 509
João Daniel Avatar asked May 06 '12 01:05

João Daniel


2 Answers

You can get the same results with:

pv <- c(0)
y <- sapply(2:length(x), function(i) {pv <<- (x[i] - x[i-1])/x[i-1]})
c(0, y)

The for loop issues that once were a problem have been optimized. Often a for loop is not slower and may even be faster than the apply solution. You have to test them both and see. I'm betting your for loop is faster than my solution.

EDIT: Just to illustrate the for loop vs. apply solution as well as what DWin discusses about vectorization I ran the benchmarking on the four solutions using microbenchmark on a win 7 machine.

Unit: microseconds
             expr     min      lq  median      uq       max
1    DIFF_Vincent  22.396  25.195  27.061  29.860  2073.848
2        FOR.LOOP 132.037 137.168 139.968 144.634 56696.989
3          SAPPLY 146.033 152.099 155.365 162.363  2321.590
4 VECTORIZED_Dwin  18.196  20.063  21.463  23.328   536.075

enter image description here

like image 190
Tyler Rinker Avatar answered Oct 22 '22 08:10

Tyler Rinker


What you offered would be the fractional variation, but if you multiplied by 100 you get the "percent variation":

pv<- vector("numeric",length(x))
pv[1] <- 0
pv[-1] <- 100* ( x[-1] - x[-length(x)] )/ x[-length(x)]

Vectorized solution. ( And you should note that for-loops are going to be just as slow as *apply solutions ... just not as pretty. Always look for a vectorized approach.)

To explain a bit more: The x[-length(x)] is the vector, x[1:(length{x-1)], and the x[-1] is the vector, x[2:length(x)], and the vector operations in R are doing the same operations as in your for-loop body, although not using an explicit loop. R first constructs the differences in those shifted vectors, x[-length(x)] - x[-1], and then divides by x[1:(length{x-1)].

like image 24
IRTFM Avatar answered Oct 22 '22 10:10

IRTFM