Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is R slow on this random permutation function?

I am new to R (Revolution Analytics R) and have been translating some Matlab functions into R.

Question: Why is the function GRPdur(n) so slow?

GRPdur = function(n){
#
# Durstenfeld's Permute algorithm, CACM 1964
# generates a random permutation of {1,2,...n}
#
p=1:n                           # start with identity p
for (k in seq(n,2,-1)){    
    r    = 1+floor(runif(1)*k); # random integer between 1 and k
    tmp  = p[k];
    p[k] = p[r];                #  Swap(p(r),p(k)).
    p[r] = tmp;                  
} 
return(p)
}

Here is what I get on a Dell Precision 690, 2xQuadcore Xeon 5345 @ 2.33 GHz, Windows 7 64-bit:

> system.time(GRPdur(10^6))
   user  system elapsed 
  15.30    0.00   15.32 
> system.time(sample(10^6))
   user  system elapsed 
   0.03    0.00    0.03 

Here is what I get in Matlab 2011b

>> tic;p = GRPdur(10^6);disp(toc)
    0.1364   

 tic;p = randperm(10^6);disp(toc)
    0.1116

Here is what I get in Matlab 2008a

>> tic;p=GRPdur(10^6);toc
Elapsed time is 0.124169 seconds.
>> tic;p=randperm(10^6);toc
Elapsed time is 0.211372 seconds.
>> 

LINKS : GRPdur is part of RPGlab, a package of Matlab functions that I wrote that generates and tests various random permutation generators. The notes can be viewed separately here: Notes on RPGlab.

The original Durstenfeld Algol program is here

like image 583
Derek O'Connor Avatar asked Jan 14 '12 18:01

Derek O'Connor


3 Answers

Both Matlab and S (later R) started out as thin wrappers around FORTRAN functions for doing math stuff.

In S/R the for-loops have "always" been slow, but that has been OK because there are usually vectorized ways of expressing the problem. Also, R has thousands of functions in Fortran or C that do higher-level things quickly. For instance, the sample function which does exactly what your for-loop does - but much more quickly.

So why then is MATLAB much better at executing scripted for-loops? Two simple reasons: RESOURCES and PRIORITIES.

MathWorks who make MATLAB is a rather big company with around 2000 employees. They decided years ago to prioritize improving the performance of scripts. They hired a bunch of compiler experts and spent years developing a Just-In-Time compiler (JIT) that takes the script code and turns it into assembler code. They did a very good job too. Kudos to them!

R is open source, and the R core team works on improving R in their spare time. Luke Tierney of R core has worked hard and developed a compiler package for R that compiles R scripts to byte code. It does NOT turn it into assembler code however, but works pretty well. Kudos to him!

...But the amount of effort put into the R compiler vs. the MATLAB compiler is simply much less, and therefore the result is slower:

system.time(GRPdur(10^6)) # 9.50 secs

# Compile the function...
f <- compiler::cmpfun(GRPdur)
system.time(f(10^6)) # 3.69 secs

As you can see, the for-loop became 3x faster by compiling it to byte code. Another difference is that the R JIT compiler is not enabled by default as it is in MATLAB.

UPDATE Just for the record, a slightly more optimized R version (based on Knuth's algorithm), where the random generation has been vectorized as @joran suggested:

f <- function(n) {
  p <- integer(n)
  p[1] <- 1L
  rv <- runif(n, 1, 1:n) # random integer between 1 and k
  for (k in 2:n) {    
    r <- rv[k]
    p[k] = p[r]         #  Swap(p(r),p(k)).
    p[r] = k
  }
  p
}
g <- compiler::cmpfun(f)
system.time(f(1e6)) # 4.84
system.time(g(1e6)) # 0.98

# Compare to Joran's version:
system.time(GRPdur1(10^6)) # 6.43
system.time(GRPdur2(10^6)) # 1.66

...still a magnitude slower than MATLAB. But again, just use sample or sample.int which apparently beats MATLAB's randperm by 3x!

system.time(sample.int(10^6)) # 0.03
like image 178
Tommy Avatar answered Sep 29 '22 08:09

Tommy


Because you wrote a c-program in an R-skin

n = 10^6L
p = 1:n
system.time( sample(p,n))
0.03    0.00    0.03
like image 32
Dieter Menne Avatar answered Sep 29 '22 07:09

Dieter Menne


Responding to the OP's request was too long to fit in a comment, so here's what I was referring to:

#Create r outside for loop
GRPdur1 <- function(n){
    p <- 1:n                          
    k <- seq(n,2,-1)
    r <- 1 + floor(runif(length(k)) * k)
    for (i in 1:length(k)){    
        tmp <- p[k[i]];
        p[k[i]] <- p[r[i]];                
        p[r[i]] <- tmp;                  
    } 
return(p)
}

library(compiler)
GRPdur2 <- cmpfun(GRPdur1)

set.seed(1)
out1 <- GRPdur(100)

set.seed(1)
out2 <- GRPdur1(100)

#Check the GRPdur1 is generating the identical output
identical(out1,out2)

system.time(GRPdur(10^6))
   user  system elapsed 
 12.948   0.389  13.232 
system.time(GRPdur2(10^6))
   user  system elapsed 
 1.908   0.018   1.910

Not quite 10x, but more than the 3x Tommy showed just using the compiler. For a somewhat more accurate timing:

library(rbenchmark)
benchmark(GRPdur(10^6),GRPdur2(10^6),replications = 10)
           test replications elapsed relative user.self sys.self 
1  GRPdur(10^6)           10 127.315 6.670946   124.358    3.656         
2 GRPdur2(10^6)           10  19.085 1.000000    19.040    0.222

So the 10x comment was (perhaps not surprisingly, being based on a single system.time run) optimistic, but the vectorization gains you a fair bit more speed over what the byte compiler does.

like image 36
joran Avatar answered Sep 29 '22 07:09

joran