Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clearing memory used by rpy2

How can I clear objects (and the memory they occupy) created via rpy?

import rpy2.robjects as r
a = r.r('a = matrix(NA, 2000000, 50)')
del a    #if I do this, there is no change in the amount of memory used
r.r('rm(list=(ls(all=TRUE)))') # Same here, the objects disappear, but the memory is still used

The unfortunate effect is that in my application, memory usage increases until there is not enough and then it crashes... From the rpy2 docs:

The object itself remains available, and protected from R’s garbage collection until foo is deleted from Python

but even doing:

import rpy2.robjects as r
a = r.r('a = matrix(NA, 2000000, 50)')
r.r.rm('a')
del a
r.r.gc()

does not free the memory used...

EDIT: rpy2 2.0, Win XP, R 2.12.0

like image 272
Benjamin Avatar asked Mar 04 '11 21:03

Benjamin


People also ask

How do I free unused memory in R?

You can force R to perform this check, and free the memory right away, by running the gc() command in R or going to Tools -> Memory -> Free Unused R Memory.

Why is R using so much memory?

R uses more memory probably because of some copying of objects. Although these temporary copies get deleted, R still occupies the space. To give this memory back to the OS you can call the gc function. However, when the memory is needed, gc is called automatically.

Does R have automatic garbage collection?

R will automatically run garbage collection whenever it needs more space; if you want to see when that is, call gcinfo(TRUE) . The only reason you might want to call gc() is to ask R to return memory to the operating system.

What is rpy2 package?

rpy2 is running an embedded R, providing access to it from Python using R's own C-API through either: a high-level interface making R functions an objects just like Python functions and providing a seamless conversion to numpy and pandas data structures.


1 Answers

There is a paragraph in the rpy docs hinting that you may need to run the Python garbage collector frequently when deleting or overwriting large objects:

R objects live in the R memory space, their size unbeknown to Python, and because of that it seems that Python does not always garbage collect often enough when large objects are involved. This is sometimes leading to transient increased memory usage when large objects are overwritten in loops, and although reaching a system’s memory limit appears to trigger garbage collection, one may wish to explicitly trigger the collection.

I was able to force rpy2 to free that large matrix by running gc.collect() immediately after creating the matrix, and again just after deleting it and running R's internal gc() function. Running it in a loop with a sleep -- use top to watch the memory usage increase / decrease.

Running under Python 2.6 on Ubuntu 10.0.4 with python-rpy version 2.0.8 linked to R version 2.10.1. Hope this helps you make some progress:

import gc
import time

import rpy2.robjects as R

for i in range(5):
    print 'pass %d' % i
    R.r('a = matrix(NA, 1000000, 50)')
    gc.collect()
    R.r('rm(a)')
    R.r('gc()')
    gc.collect()

    print 'sleeping..'
    time.sleep(5)
like image 78
samplebias Avatar answered Sep 30 '22 09:09

samplebias