Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace rbind in for-loop with lapply? (2nd circle of hell)

I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:

Some initialisations and a function definition:

a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)

myfunction <- function(frame,columns){
athing = 0
   if(columns == 5){
   athing = 100
   }
   else{
   athing = 1000
   }
value[colums+1] = athing
return(value)}

The problematic for-loop looks like this:

columns = 6
for(i in 1:nrow(myframe){
   values <- myfunction(as.matrix(myframe[i,]), columns)
   values[columns+2] = i
   values[columns+3] = myframe[i,3]
   #more columns added with simple operations (i.e. sum)

   solution <- rbind(solution,values)
   #solution is a large matrix from outside the for-loop
}

The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB). I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.

myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])

I have not really come further than this, although I tried applying this very good introduction to parallel processing.

How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...

Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?

like image 499
user3347232 Avatar asked Feb 12 '23 14:02

user3347232


1 Answers

The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like

my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
    # Call all necessary commands to create values
    my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
like image 103
konvas Avatar answered Feb 14 '23 10:02

konvas