Does order of loops matter for speed in R

Q: Which order is correct for a for loop?

So first the condition is checked, then the loop body is executed, then the increment.

Q: Is while loop faster than for loop in R?

for loops are fast. What you do inside the loop is slow (in comparison to vectorized operations). I would expect a while loop to be slower than a for loop since it needs to test a condition before each iteration. Keep in mind that R is an interpreted language, i.e., there are no compiler optimizations.

Tags:

r

I have a problem where I want to run a simulation study where the simulation depends on two variables x and y. x and y are vectors of potential values that I want to evaluate (so different combinations) in my simulation study. Furthermore, for each combination of x and y I want multiple replicates (since there is a stochastic term in there and each run of x and y will vary).

To give an example of what I am dealing with I have the following simplified example:

x = 1:10
y = 11:20

iterations = 2000
iter = 1

solution = array(NA,c(length(x),3,iterations))
for(i in x){
    for(j in y){
        for(k in 1:iterations){ 
            z = rnorm(3) + c(i,j,1) 
            solution[i,,k] = z
        }
    }
}

However in my actual problem, the code that get evaluated inside the for loop is much less trivial to evaluate. However, the structure of my inputs are the same as well as the output.

So what I would like to know, say using the above example, is it most efficient to set up the loops in that order or would it be better to let k in 1:iterations be the outermost loop and try to use some sort of outer() command within that 1 loop since I will be evaluating the function (z in this example) over the grid x and y?

Also, I am very open to a completely different set-up and design. At the end of the day I want to be able to obtain a solution that is based on x and y and averaged over all the iterations, i.e., apply(solution, c(1,2),mean)

Edit:

As was suggested to me, here is the actual code that I am using.

library(survival)

iter = 2000
n = 120
base = 2
hr = 0.5
m.x = 3
m.y = m.x/hr

ANS = NULL
for (vecX in c(0.3, 0.5, 0.6, 0.7)){
out = NULL

    for (vecY in c(0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95)){
        m.x.p = m.x/vecX
        m.y.p = m.y/vecX
        m.x.n = m.x
        m.y.n = m.y
        
        n.t = round(n*base/(base+1))
        n.c = n - n.t
        
        for (ii in 1:iter){
            n.t.p = rbinom(1, n.t, vecY)
            n.t.n = n.t - n.t.p
            n.c.p = rbinom(1, n.c, vecY)
            n.c.n = n.c - n.c.p
            
            S = c(rexp(n.t.p, log(2)/m.y.p), rexp(n.t.n, log(2)/m.y.n), rexp(n.c.p, log(2)/m.x.p), rexp(n.c.n, log(2)/m.x.n))
            
            data1 = data.frame(Group = c(rep("T", n.t), rep("C", n.c)), dx = c(rep("P", n.t.p), rep("N", n.t.n), rep("P", n.c.p), rep("N", n.c.n)), S)
            
            fit = survfit(Surv(data1$S)~data1$Group)
            coxfit = coxph(Surv(data1$S)~data1$Group)
            
            HR = exp(coxfit$coefficients)
            p.val=summary(coxfit)$logtest["pvalue"]
            
            out = rbind(out, c(vecX, vecY, n.t.p, n.t.n, n.c.p, n.c.n, HR, p.val))
        }
        
    }
    colnames(out) = c("vecX", "vecY", "n.t.p", "n.t.n", "n.c.p", "n.c.n", "HR", "p.val")
    
    ans = as.data.frame(out)
ANS = rbind(ANS, ans)

}

762

asked Mar 24 '16 18:03

RustyStatistician

2 Answers

Yes, I believe that in theory it should make a difference (see example below).

R uses column-major ordering like Fortran (and unlike C), so to minimize cache misses you'd want to be traversing down columns. So for filling a matrix, the optimal approach is the one where the outer loop has our column index.

And for n-dimensional arrays, you'd want to bear this in mind, too. In the case that n = 3, I guess this would mean having the layer be the outermost loop, then the column, then the row. I could be mistaken here, though.

I ran this quick example with 5000 by 5000 matrices. And we are seeing a difference of about 50 seconds, with fill_matrix2() being faster.

    n <- 5000
    A <- matrix(NA, n, n)
    B <- matrix(NA, n, n)

    fill_matrix1 <- function(X, val) {
        for (i in 1:nrow(X)) {
            for (j in 1:ncol(X)) {
                X[i, j] <- val
            }
        }
        return(X)
    }

    fill_matrix2 <- function(X, val) {
        for (j in 1:ncol(X)) {
            for (i in 1:nrow(X)) {
                X[i, j] <- val
            }
        }
        return(X)
    }

    system.time(fill_matrix1(A, 0))
    system.time(fill_matrix2(B, 0))

answered Nov 13 '22 09:11

paulstey

The order of the loops is practically irrelevant here. If you profile your code (see help("Rprof")) you'll see that the CPU time is spent in functions like survfit and coxph. And of course in growing out, which you should avoid. Pre-allocate out to its final size and fill it instead of growing it.

answered Nov 13 '22 09:11

Roland

Related questions
                            
                                How to search for a string in one column in other columns of a data frame
                            
                                Using italic() with a variable in ggplot2 title expression
                            
                                Sum all columns whose names start with a pattern, by group
                            
                                sas7bdat date format to R date format
                            
                                How is geom_point removing rows containing missing values?
                            
                                Remove header row in shiny datatable
                            
                                How to randomly subset of data with dplyr?
                            
                                R datatable: Hide search box for individual columns
                            
                                Submit form with no submit button in rvest
                            
                                ggplot2 violin plot: fill central 95% only?
                            
                                Extract a sample of words around a particular word using stringr in R
                            
                                Is an R object of class "POSIXlt" a "list" or not?
                            
                                data.table fread how to ignore empty line [duplicate]
                            
                                r programming --- merge function returns column names with .x and .y
                            
                                Including links within Rmarkdown tables (pdf)
                            
                                Error: package or namespace load failed for ‘car’
                            
                                Merging data frames with different number of rows and different columns
                            
                                R Shiny display output externally hosted image
                            
                                Custom legend with R leaflet
                            
                                Shiny conditionalPanel conditions based on NULL values in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With