So I think I don't quite understand how memory is working in R. I've been running into problems where the same piece of code gets slower later in the week (using the same R session - sometimes even when I clear the workspace). I've tried to develop a toy problem that I think reproduces the "slowing down affect" I have been observing, when working with large objects. Note the code below is somewhat memory intensive (don't blindly run this code without adjusting n and N to match what your set up can handle). Note that it will likely take you about 5-10 minutes before you start to see this slowing down pattern (possibly even longer).
N=4e7 #number of simulation runs n=2e5 #number of simulation runs between calculating time elapsed meanStorer=rep(0,N); toc=rep(0,N/n); x=rep(0,50); for (i in 1:N){ if(i%%n == 1){tic=proc.time()[3]} x[]=runif(50); meanStorer[i] = mean(x); if(i%%n == 0){toc[i/n]=proc.time()[3]-tic; print(toc[i/n])} } plot(toc)
meanStorer is certainly large, but it is pre-allocated, so I am not sure why the loop slows down as time goes on. If I clear my workspace and run this code again it will start just as slow as the last few calculations! I am using Rstudio (in case that matters). Also here is some of my system information
Here is a plot of toc, prior to using pre-allocation for x (i.e. using x=runif(50)
in the loop)
Here is a plot of toc, after using pre-allocation for x (i.e. using x[]=runif(50)
in the loop)
Is ?rm not doing what I think it's doing? Whats going on under the hood when I clear the workspace?
Update: with the newest version of R (3.1.0), the problem no longer persists even when increasing N to N=3e8 (note R doesn't allow vectors too much larger than this)
Although it is quite unsatisfying that the fix is just updating R to the newest version, because I can't seem to figure out why there was problems in version 2.15. It would still be nice to know what caused them, so I am going to continue to leave this question open.
Beyond performance limitations due to design and implementation, it has to be said that a lot of R code is slow simply because it's poorly written. Few R users have any formal training in programming or software development. Fewer still write R code for a living.
There is a lot of overhead in the processing because R needs to check the type of a variable nearly every time it looks at it. This makes it easy to change types and reuse variable names, but slows down computation for very repetitive tasks, like performing an action in a loop.
The total duration of the R Script is approximately 11 minutes and 12 seconds, being roughly 7.12 seconds per loop. The total duration of the Python Script is approximately 2 minutes and 2 seconds, being roughly 1.22 seconds per loop. The Python code is 5.8 times faster than the R alternative!
As you state in your updated question, the high-level answer is because you are using an old version of R with a bug, since with the newest version of R (3.1.0), the problem no longer persists.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With