I would like to speed up my calculations and obtain results without using loop in function m
. Reproducible example:
N <- 2500
n <- 500
r <- replicate(1000, sample(N, n))
m <- function(r, N) {
ic <- matrix(0, nrow = N, ncol = N)
for (i in 1:ncol(r)) {
p <- r[, i]
ic[p, p] <- ic[p, p] + 1
}
ic
}
system.time(ic <- m(r, N))
# user system elapsed
# 6.25 0.51 6.76
isSymmetric(ic)
# [1] TRUE
In every iteration of for
loop we are dealing with matrix not vector, so how this could be Vectorized?
@joel.wilson The purpose of this function is to calculate pairwise frequencies of elements. So afterwards we could estimate pairwise inclusion probabilities.
Thanks to @Khashaa and @alexis_laz. Benchmarks:
> require(rbenchmark)
> benchmark(m(r, N),
+ m1(r, N),
+ mvec(r, N),
+ alexis(r, N),
+ replications = 10, order = "elapsed")
test replications elapsed relative user.self sys.self user.child sys.child
4 alexis(r, N) 10 4.73 1.000 4.63 0.11 NA NA
3 mvec(r, N) 10 5.36 1.133 5.18 0.18 NA NA
2 m1(r, N) 10 5.48 1.159 5.29 0.19 NA NA
1 m(r, N) 10 61.41 12.983 60.43 0.90 NA NA
Just like with repeat and while loops, you can break out of a for loop completely by using the break statement.
Results. The sapply() was faster than the for() loop, but how much faster depends on the values of n .
You can speed up some tasks by vectorizing your code. For example, if you had a vector of the integers from 1 to 1,000,000 and wanted to multiply each by 3.14 and store the result in another vector, you could either (1) use a for loop or (2) multiply your vector by 3.14, which is a vectorized operation.
If you are using for loops, you are most likely coding R as if it was C or Java or something else. R code that is properly vectorised is extremely fast. Take for example these two simple bits of code to generate a list of 10,000 integers in sequence: The first code example is how one would code a loop using a traditional coding paradigm.
Speed gains will be variable depending on how much guessing you eliminate. Next, consider optimized packages: The data.table package can produce massive speed gains where its use is possible, in data manipulation and in reading large amounts of data ( fread ). Next, try for speed gains through more efficient means of calling R:
The best way to speed up any loop is to do less of it. Tighten up the controlling condition (s) so that you can exit it at the earliest possible instant, and make sure the code inside it is as “tight” (most efficient code, fewest possible instructions/statements) as you can make it. Is recursive code generally slow? Not necessarily.
This should be significantly faster as it avoids operations on double indexing
m1 <- function(r, N) {
ic <- matrix(0, nrow = N, ncol=ncol(r))
for (i in 1:ncol(r)) {
p <- r[, i]
ic[, i][p] <- 1
}
tcrossprod(ic)
}
system.time(ic1 <- m1(r, N))
# user system elapsed
# 0.53 0.01 0.55
all.equal(ic, ic1)
# [1] TRUE
Simple "counting/adding" operations can almost always be vectorized
mvec <- function(r, N) {
ic <- matrix(0, nrow = N, ncol=ncol(r))
i <- rep(1:ncol(r), each=nrow(r))
ic[cbind(as.vector(r), i)] <- 1
tcrossprod(ic)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With