Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is R slowing down as time goes on, when the computations are the same?

So I think I don't quite understand how memory is working in R. I've been running into problems where the same piece of code gets slower later in the week (using the same R session - sometimes even when I clear the workspace). I've tried to develop a toy problem that I think reproduces the "slowing down affect" I have been observing, when working with large objects. Note the code below is somewhat memory intensive (don't blindly run this code without adjusting n and N to match what your set up can handle). Note that it will likely take you about 5-10 minutes before you start to see this slowing down pattern (possibly even longer).

N=4e7 #number of simulation runs n=2e5 #number of simulation runs between calculating time elapsed meanStorer=rep(0,N); toc=rep(0,N/n); x=rep(0,50);  for (i in 1:N){   if(i%%n == 1){tic=proc.time()[3]}   x[]=runif(50);   meanStorer[i] = mean(x);   if(i%%n == 0){toc[i/n]=proc.time()[3]-tic; print(toc[i/n])} }  plot(toc) 

meanStorer is certainly large, but it is pre-allocated, so I am not sure why the loop slows down as time goes on. If I clear my workspace and run this code again it will start just as slow as the last few calculations! I am using Rstudio (in case that matters). Also here is some of my system information

  • OS: Windows 7
  • System Type: 64-bit
  • RAM: 8gb
  • R version: 2.15.1 ($platform yields "x86_64-pc-mingw32")

Here is a plot of toc, prior to using pre-allocation for x (i.e. using x=runif(50) in the loop)

enter image description here

Here is a plot of toc, after using pre-allocation for x (i.e. using x[]=runif(50) in the loop)

enter image description here

Is ?rm not doing what I think it's doing? Whats going on under the hood when I clear the workspace?

Update: with the newest version of R (3.1.0), the problem no longer persists even when increasing N to N=3e8 (note R doesn't allow vectors too much larger than this)

enter image description here

Although it is quite unsatisfying that the fix is just updating R to the newest version, because I can't seem to figure out why there was problems in version 2.15. It would still be nice to know what caused them, so I am going to continue to leave this question open.

like image 245
WetlabStudent Avatar asked May 27 '14 23:05

WetlabStudent


People also ask

Why is my R code so slow?

Beyond performance limitations due to design and implementation, it has to be said that a lot of R code is slow simply because it's poorly written. Few R users have any formal training in programming or software development. Fewer still write R code for a living.

Why does R take so long to run?

There is a lot of overhead in the processing because R needs to check the type of a variable nearly every time it looks at it. This makes it easy to change types and reuse variable names, but slows down computation for very repetitive tasks, like performing an action in a loop.

How fast is R language?

The total duration of the R Script is approximately 11 minutes and 12 seconds, being roughly 7.12 seconds per loop. The total duration of the Python Script is approximately 2 minutes and 2 seconds, being roughly 1.22 seconds per loop. The Python code is 5.8 times faster than the R alternative!


1 Answers

As you state in your updated question, the high-level answer is because you are using an old version of R with a bug, since with the newest version of R (3.1.0), the problem no longer persists.

like image 170
3 revs, 2 users 62% Avatar answered Oct 06 '22 00:10

3 revs, 2 users 62%