I am new to R (Revolution Analytics R) and have been translating some Matlab functions into R.
Question: Why is the function GRPdur(n) so slow?
GRPdur = function(n){
#
# Durstenfeld's Permute algorithm, CACM 1964
# generates a random permutation of {1,2,...n}
#
p=1:n # start with identity p
for (k in seq(n,2,-1)){
r = 1+floor(runif(1)*k); # random integer between 1 and k
tmp = p[k];
p[k] = p[r]; # Swap(p(r),p(k)).
p[r] = tmp;
}
return(p)
}
Here is what I get on a Dell Precision 690, 2xQuadcore Xeon 5345 @ 2.33 GHz, Windows 7 64-bit:
> system.time(GRPdur(10^6))
user system elapsed
15.30 0.00 15.32
> system.time(sample(10^6))
user system elapsed
0.03 0.00 0.03
Here is what I get in Matlab 2011b
>> tic;p = GRPdur(10^6);disp(toc)
0.1364
tic;p = randperm(10^6);disp(toc)
0.1116
Here is what I get in Matlab 2008a
>> tic;p=GRPdur(10^6);toc
Elapsed time is 0.124169 seconds.
>> tic;p=randperm(10^6);toc
Elapsed time is 0.211372 seconds.
>>
LINKS : GRPdur is part of RPGlab, a package of Matlab functions that I wrote that generates and tests various random permutation generators. The notes can be viewed separately here: Notes on RPGlab.
The original Durstenfeld Algol program is here
Both Matlab and S (later R) started out as thin wrappers around FORTRAN functions for doing math stuff.
In S/R the for-loops have "always" been slow, but that has been OK because there are usually vectorized ways of expressing the problem. Also, R has thousands of functions in Fortran or C that do higher-level things quickly. For instance, the sample
function which does exactly what your for-loop does - but much more quickly.
So why then is MATLAB much better at executing scripted for-loops? Two simple reasons: RESOURCES and PRIORITIES.
MathWorks who make MATLAB is a rather big company with around 2000 employees. They decided years ago to prioritize improving the performance of scripts. They hired a bunch of compiler experts and spent years developing a Just-In-Time compiler (JIT) that takes the script code and turns it into assembler code. They did a very good job too. Kudos to them!
R is open source, and the R core team works on improving R in their spare time. Luke Tierney of R core has worked hard and developed a compiler package for R that compiles R scripts to byte code. It does NOT turn it into assembler code however, but works pretty well. Kudos to him!
...But the amount of effort put into the R compiler vs. the MATLAB compiler is simply much less, and therefore the result is slower:
system.time(GRPdur(10^6)) # 9.50 secs
# Compile the function...
f <- compiler::cmpfun(GRPdur)
system.time(f(10^6)) # 3.69 secs
As you can see, the for-loop became 3x faster by compiling it to byte code. Another difference is that the R JIT compiler is not enabled by default as it is in MATLAB.
UPDATE Just for the record, a slightly more optimized R version (based on Knuth's algorithm), where the random generation has been vectorized as @joran suggested:
f <- function(n) {
p <- integer(n)
p[1] <- 1L
rv <- runif(n, 1, 1:n) # random integer between 1 and k
for (k in 2:n) {
r <- rv[k]
p[k] = p[r] # Swap(p(r),p(k)).
p[r] = k
}
p
}
g <- compiler::cmpfun(f)
system.time(f(1e6)) # 4.84
system.time(g(1e6)) # 0.98
# Compare to Joran's version:
system.time(GRPdur1(10^6)) # 6.43
system.time(GRPdur2(10^6)) # 1.66
...still a magnitude slower than MATLAB. But again, just use sample
or sample.int
which apparently beats MATLAB's randperm
by 3x!
system.time(sample.int(10^6)) # 0.03
Because you wrote a c-program in an R-skin
n = 10^6L
p = 1:n
system.time( sample(p,n))
0.03 0.00 0.03
Responding to the OP's request was too long to fit in a comment, so here's what I was referring to:
#Create r outside for loop
GRPdur1 <- function(n){
p <- 1:n
k <- seq(n,2,-1)
r <- 1 + floor(runif(length(k)) * k)
for (i in 1:length(k)){
tmp <- p[k[i]];
p[k[i]] <- p[r[i]];
p[r[i]] <- tmp;
}
return(p)
}
library(compiler)
GRPdur2 <- cmpfun(GRPdur1)
set.seed(1)
out1 <- GRPdur(100)
set.seed(1)
out2 <- GRPdur1(100)
#Check the GRPdur1 is generating the identical output
identical(out1,out2)
system.time(GRPdur(10^6))
user system elapsed
12.948 0.389 13.232
system.time(GRPdur2(10^6))
user system elapsed
1.908 0.018 1.910
Not quite 10x, but more than the 3x Tommy showed just using the compiler. For a somewhat more accurate timing:
library(rbenchmark)
benchmark(GRPdur(10^6),GRPdur2(10^6),replications = 10)
test replications elapsed relative user.self sys.self
1 GRPdur(10^6) 10 127.315 6.670946 124.358 3.656
2 GRPdur2(10^6) 10 19.085 1.000000 19.040 0.222
So the 10x comment was (perhaps not surprisingly, being based on a single system.time
run) optimistic, but the vectorization gains you a fair bit more speed over what the byte compiler does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With