Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Code Taking Too Long To Run

I have the following code running and it's taking me a long time to run. How do I know if it's still doing its job or it got stuck somewhere.

noise4<-NULL;
for(i in 1:length(noise3))
{
    if(is.na(noise3[i])==TRUE)
    {
    next;
    }
    else
    {
    noise4<-c(noise4,noise3[i]);
    }
}

noise3 is a vector with 2418233 data points.

like image 697
Concerned_Citizen Avatar asked Nov 28 '22 17:11

Concerned_Citizen


2 Answers

You just want to remove the NA values. Do it like this:

noise4 <- noise3[!is.na(noise3)]

This will be pretty much instant.

Or as Joshua suggests, a more readable alternative:

noise4 <- na.omit(noise3)

Your code was slow because:

  1. It uses explicit loops which tend to be slow under the R interpreter.
  2. You reallocate memory every iteration.

The memory reallocation is probably the biggest handicap to your code.

like image 192
David Heffernan Avatar answered Dec 01 '22 06:12

David Heffernan


I wanted to illustrate the benefits of pre-allocation, so I tried to run your code... but I killed it after ~5 minutes. I recommend you use noise4 <- na.omit(noise3) as I said in my comments. This code is solely for illustrative purposes.

# Create some random data
set.seed(21)
noise3 <- rnorm(2418233)
noise3[sample(2418233, 100)] <- NA

noise <- function(noise3) {
  # Pre-allocate
  noise4 <- vector("numeric", sum(!is.na(noise3)))
  for(i in seq_along(noise3)) {
    if(is.na(noise3[i])) {
      next
    } else {
      noise4[i] <- noise3[i]
    }
  }
}

system.time(noise(noise3)) # MUCH less than 5+ minutes
#    user  system elapsed 
#    9.50    0.44    9.94 

# Let's see what we gain from compiling
library(compiler)
cnoise <- cmpfun(noise)
system.time(cnoise(noise3))  # a decent reduction
#    user  system elapsed 
#    3.46    0.49    3.96 
like image 21
Joshua Ulrich Avatar answered Dec 01 '22 08:12

Joshua Ulrich