Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: getting rid of for loop and speeding code

I would like to speed up my calculations and obtain results without using loop in function m. Reproducible example:

N <- 2500
n <- 500
r <- replicate(1000, sample(N, n))

m <- function(r, N) {
  ic <- matrix(0, nrow = N, ncol = N)
  for (i in 1:ncol(r)) { 
    p <- r[, i]
    ic[p, p] <- ic[p, p] + 1
  }
  ic
}

system.time(ic <- m(r, N))
#  user  system elapsed 
#  6.25    0.51    6.76 
isSymmetric(ic)
# [1] TRUE

In every iteration of for loop we are dealing with matrix not vector, so how this could be Vectorized?

@joel.wilson The purpose of this function is to calculate pairwise frequencies of elements. So afterwards we could estimate pairwise inclusion probabilities.

Thanks to @Khashaa and @alexis_laz. Benchmarks:

> require(rbenchmark)
> benchmark(m(r, N),
+           m1(r, N),
+           mvec(r, N),
+           alexis(r, N),
+           replications = 10, order = "elapsed")
          test replications elapsed relative user.self sys.self user.child sys.child
4 alexis(r, N)           10    4.73    1.000      4.63     0.11         NA        NA
3   mvec(r, N)           10    5.36    1.133      5.18     0.18         NA        NA
2     m1(r, N)           10    5.48    1.159      5.29     0.19         NA        NA
1      m(r, N)           10   61.41   12.983     60.43     0.90         NA        NA
like image 895
minem Avatar asked Nov 29 '16 10:11

minem


People also ask

How do I remove a loop in R?

Just like with repeat and while loops, you can break out of a for loop completely by using the break statement.

What is faster than for loop in R?

Results. The sapply() was faster than the for() loop, but how much faster depends on the values of n .

How can I speed up my code?

You can speed up some tasks by vectorizing your code. For example, if you had a vector of the integers from 1 to 1,000,000 and wanted to multiply each by 3.14 and store the result in another vector, you could either (1) use a for loop or (2) multiply your vector by 3.14, which is a vectorized operation.

Why are for loops so fast in R?

If you are using for loops, you are most likely coding R as if it was C or Java or something else. R code that is properly vectorised is extremely fast. Take for example these two simple bits of code to generate a list of 10,000 integers in sequence: The first code example is how one would code a loop using a traditional coding paradigm.

How can I speed up my R program?

Speed gains will be variable depending on how much guessing you eliminate. Next, consider optimized packages: The data.table package can produce massive speed gains where its use is possible, in data manipulation and in reading large amounts of data ( fread ). Next, try for speed gains through more efficient means of calling R:

How do you speed up a recursive loop?

The best way to speed up any loop is to do less of it. Tighten up the controlling condition (s) so that you can exit it at the earliest possible instant, and make sure the code inside it is as “tight” (most efficient code, fewest possible instructions/statements) as you can make it. Is recursive code generally slow? Not necessarily.


Video Answer


1 Answers

This should be significantly faster as it avoids operations on double indexing

m1 <- function(r, N) {
  ic <- matrix(0, nrow = N, ncol=ncol(r))
  for (i in 1:ncol(r)) { 
    p <- r[, i]
    ic[, i][p] <- 1
  }
  tcrossprod(ic)
}

system.time(ic1 <- m1(r, N))
#   user  system elapsed 
#   0.53    0.01    0.55  

all.equal(ic, ic1)
# [1] TRUE

Simple "counting/adding" operations can almost always be vectorized

mvec <- function(r, N) {
  ic <- matrix(0, nrow = N, ncol=ncol(r))
  i <- rep(1:ncol(r), each=nrow(r))
  ic[cbind(as.vector(r), i)] <- 1
  tcrossprod(ic)
}
like image 55
Khashaa Avatar answered Sep 30 '22 20:09

Khashaa