Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading .RData files into Python

Tags:

python

r

rdata

I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?

like image 405
Stu Avatar asked Jan 22 '14 16:01

Stu


People also ask

How do I import a .RData file?

You can also import the data via the "Import Dataset" tab in RStudio, under "global environment." Use the text data option in the drop down list and select your . RData file from the folder. Once the import is complete, it will display the data in the console.

Where is the .RData file?

RData file in the data folder of your working directory. This file now contains all of your objects that you can easily access later using the load() function (we'll go over this in a second…).


2 Answers

As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.

It is a wrapper around the C library librdata, so it is very fast.

You can install it easily with pip:

pip install pyreadr 

As an example you would do:

import pyreadr  result = pyreadr.read_r('/path/to/file.RData') # also works for Rds  # done! let's see what we got # result is a dictionary where keys are the name of objects and the values python # objects print(result.keys()) # let's check what objects we got df1 = result["df1"] # extract the pandas data frame for object df1 

The repo is here: https://github.com/ofajardo/pyreadr

Disclaimer: I am the developer of this package.

like image 101
Otto Fajardo Avatar answered Sep 28 '22 16:09

Otto Fajardo


People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData file format. So any other implementation in any other language is hard++.

I think the only reasonable way is to install RPy2 and use R's load function from that, converting to appropriate python objects as you go. The .RData file can contain structured objects as well as plain tables so watch out.

Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/

Quicky:

>>> import rpy2.robjects as robjects >>> robjects.r['load'](".RData") 

objects are now loaded into the R workspace.

>>> robjects.r['y'] <FloatVector - Python:0x24c6560 / R:0xf1f0e0> [0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317] 

That's a simple scalar, d is a data frame, I can subset to get columns:

>>> robjects.r['d'][0] <IntVector - Python:0x24c9248 / R:0xbbc6c0> [       1,        2,        3, ...,        8,        9,       10] >>> robjects.r['d'][1] <FloatVector - Python:0x24c93b0 / R:0xf1f230> [0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136] 
like image 27
Spacedman Avatar answered Sep 28 '22 17:09

Spacedman