Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I pass large arrays between numpy and R?

I'm using python and numpy/scipy to do regex and stemming for a text processing application. But I want to use some of R's statistical packages as well.

What's the best way to pass the data from python to R? (And back?)

Also, I need to backup the array to disk at some point, so I'm open to saving from python and loading th R if that's the best solution. The matrices are pretty big (e.g. 100,000 x 10,000), so using sparse matrices might also be nice.

Apologies if this is a repost. I haven't been able to find anything that puts all these pieces together.

like image 710
Abe Avatar asked Apr 13 '11 18:04

Abe


People also ask

Is there a limit to NumPy array size?

There is no general maximum array size in numpy.

Is appending to NumPy array faster than list?

NumPy Arrays Are NOT Always Faster Than Lists " append() " adds values to the end of both lists and NumPy arrays.

Are NumPy arrays expandable?

NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.

How do I save a large NumPy array?

You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma.


2 Answers

  • Have you already looked into RPy? It's a python interface to R. I guess that would spare you the data handling.

  • To backup your NumPy arrays you can use pickle. As it seems to create a lot of overhead when saving huge data, NumPy arrays are best saved using the HDF standard. Here's a article covering that: http://www.shocksolution.com/2010/01/10/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/

like image 64
das_weezul Avatar answered Oct 22 '22 00:10

das_weezul


Use Rpy, http://rpy.sourceforge.net/, to call R from Python.

The caveat is that both R and Python versions need to be exactly the one for which the Rpy binary has been built. You thus need to be careful with the installation.

like image 23
Gael Varoquaux Avatar answered Oct 21 '22 23:10

Gael Varoquaux