Loading .RData files into Python

Tags:

I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?

405

asked Jan 22 '14 16:01

Stu

2 Answers

As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.

It is a wrapper around the C library librdata, so it is very fast.

You can install it easily with pip:

pip install pyreadr

As an example you would do:

import pyreadr  result = pyreadr.read_r('/path/to/file.RData') # also works for Rds  # done! let's see what we got # result is a dictionary where keys are the name of objects and the values python # objects print(result.keys()) # let's check what objects we got df1 = result["df1"] # extract the pandas data frame for object df1

The repo is here: https://github.com/ofajardo/pyreadr

Disclaimer: I am the developer of this package.

101

answered Sep 28 '22 16:09

Otto Fajardo

People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData file format. So any other implementation in any other language is hard++.

I think the only reasonable way is to install RPy2 and use R's load function from that, converting to appropriate python objects as you go. The .RData file can contain structured objects as well as plain tables so watch out.

Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/

Quicky:

>>> import rpy2.robjects as robjects >>> robjects.r['load'](".RData")

objects are now loaded into the R workspace.

>>> robjects.r['y'] <FloatVector - Python:0x24c6560 / R:0xf1f0e0> [0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]

That's a simple scalar, d is a data frame, I can subset to get columns:

>>> robjects.r['d'][0] <IntVector - Python:0x24c9248 / R:0xbbc6c0> [       1,        2,        3, ...,        8,        9,       10] >>> robjects.r['d'][1] <FloatVector - Python:0x24c93b0 / R:0xf1f230> [0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]

answered Sep 28 '22 17:09

Spacedman

Related questions
                            
                                How to get a list of built-in modules in python?
                            
                                Python: Read several json files from a folder
                            
                                preprocess_input() method in keras
                            
                                How to customize the auth.User Admin page in Django CRUD?
                            
                                Creating HTML in python
                            
                                plotting results of hierarchical clustering ontop of a matrix of data in python
                            
                                Postpone code for later execution in python (like setTimeout in javascript) [duplicate]
                            
                                How to add column to numpy array
                            
                                Unsupported operation :not writeable python
                            
                                syntax error when using command line in python
                            
                                confidence and prediction intervals with StatsModels
                            
                                AttributeError: 'Flask' object has no attribute 'user_options'
                            
                                python pip on Windows - command 'cl.exe' failed
                            
                                Plot a histogram from a Dictionary
                            
                                How do you merge images into a canvas using PIL/Pillow?
                            
                                @Patch decorator is not compatible with pytest fixture
                            
                                Spark DataFrame TimestampType - how to get Year, Month, Day values from field?
                            
                                How to count unique ID after groupBy in pyspark
                            
                                type hinting within a class [duplicate]
                            
                                global variable warning in python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Loading .RData files into Python

Tags:

python

r

rdata

Stu

People also ask

2 Answers

Otto Fajardo

Spacedman

Recent Activity

Donate For Us