Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making nested for loops in R more efficient

I am working on a research project where I want to determine equivalence of two distributions. I am currently using the Mann-Whitney Test for Equivalence and the code I am running (below) was provided with the book Testing Statistical Hypotheses of Equivalence and Noninferiority by Stefan Wellek (2010). Before running my data I am testing this code with random normal distributions which have the same mean and standard deviation. My problem is that there are three nested for loops and when running larger distributions sizes (as in the example below) the code takes forever to run. If I only had to run it once that would not be such a problem, but I am doing a simulation test and creating power curves so I need to run many iterations of this code (around 10,000). At the moment, depending on how I alter the distribution sizes, it takes days to run 10,000 iterations.

Any help in a way to increase the performance of this would be greatly appreciated.

x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)

alpha <- 0.05
m <- length(x)
n <- length(y)
eps1_ <- 0.2 #0.1382 default
eps2_ <- 0.2 #0.2602 default

eqctr <- 0.5 + (eps2_-eps1_)/2 
eqleng <- eps1_ + eps2_

wxy <- 0
pihxxy <- 0
pihxyy <- 0

for (i in 1:m)
 for (j in 1:n)
  wxy <- wxy + trunc(0.5*(sign(x[i] - y[j]) + 1))

for (i in 1:m)
 for (j1 in 1:(n-1))
  for (j2 in (j1+1):n)
    pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))

for (i1 in 1:(m-1))
 for (i2 in (i1+1):m)
  for (j in 1:n)
    pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))

wxy <- wxy / (m*n)
pihxxy <- pihxxy*2 / (m*(m-1)*n)
pihxyy <- pihxyy*2 / (n*(n-1)*m)
sigmah <- sqrt((wxy-(m+n-1)*wxy**2+(m-1)*pihxxy+(n-1)*pihxyy)/(m*n))

crit <- sqrt(qchisq(alpha,1,(eqleng/2/sigmah)**2))

if (abs((wxy-eqctr)/sigmah) >= crit) rej <- 1
if (abs((wxy-eqctr)/sigmah) < crit)  rej <- 0

if (is.na(sigmah) || is.na(crit)) rej <- 1

MW_Decision <- rej

cat(" ALPHA =",alpha,"  M =",m,"  N =",n,"  EPS1_ =",eps1_,"  EPS2_ =",eps2_,
  "\n","WXY =",wxy,"  SIGMAH =",sigmah,"  CRIT =",crit,"  REJ=",MW_Decision)
like image 499
elaw10 Avatar asked Dec 25 '22 05:12

elaw10


1 Answers

See edit below for an even better suggestion

One simple suggestion to get a bit of a speed boost is to byte compile your code.

For example, I wrapped your code into a function starting from the alpha <- 0.05 line and ran it on my laptop. Simply byte compiling your current code, it runs twice as fast.

set.seed(1234)
x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)

# f1 <- function(x,y){ ...your code...}

system.time(f1(x, y))
#   user  system elapsed 
# 33.249   0.008  33.278 

library(compiler)
f2 <- cmpfun(f1)

system.time(f2(x, y))

#   user  system elapsed 
# 17.162   0.002  17.170 

EDIT

I should add, this is the type of things that a different language would do much better than R. Have you looked at the Rcpp and the inline packages?

I've been curious to learn how to use them so I figured this was a good chance.

Here's a tweak of your code using the inline package and Fortran (since I'm more comfortable with that than C). It wasn't hard at all (provided you know Fortran or C); I just followed the examples listed in cfunction.

First, let's re-write your loops and compile them:

library(inline)

# Fortran code for first loop
loop1code <- "
   integer i,  j1,  j2
   real*8 tmp
   do i = 1, m
      do j1 = 1, n-1
         do j2 = j1+1, n
            tmp = x(i) - max(y(j1),y(j2))
            if (tmp > 0.) pihxyy = pihxyy + 1
         end do
      end do
   end do
"    
# Compile the code and turn loop into a function
loop1fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxyy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop1code, language="F95")

# Fortran code for second loop
loop2code <- "
   integer i1, i2,  j
   real*8 tmp
   do i1 = 1, m-1
      do i2 = i1+1, m
         do j = 1, n
            tmp = min(x(i1), x(i2)) - y(j)
            if (tmp > 0.) pihxxy = pihxxy + 1
         end do
      end do
   end do
"    
# Compile the code and turn loop into a function
loop2fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxxy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop2code, language="F95")

Now let's create a new function that uses these. So it's not too long, I'll just sketch the key parts I modified from your code:

f3 <- function(x, y){

  # ... code ...

# Remove old loop
## for (i in 1:m)
##  for (j1 in 1:(n-1))
##   for (j2 in (j1+1):n)
##     pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))

# Call new function from compiled code instead
pihxyy <- loop1fun(x, y, pihxyy, m, n)$pihxyy

# Remove second loop
## for (i1 in 1:(m-1))
##  for (i2 in (i1+1):m)
##   for (j in 1:n)
##     pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))

# Call new compiled function for second loop
pihxxy <- loop2fun(x, y, pihxxy, m, n)$pihxxy

# ... code ...
}

And now we run it and voila, we get a huge speed boost! :)

system.time(f3(x, y))
#   user  system elapsed 
    0.12    0.00    0.12 

I did check that it got the same results as your code, but you probably want to run some additional tests just in case.

like image 188
Gabe Avatar answered Jan 05 '23 04:01

Gabe