Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

converting .rda to pandas dataframe

Tags:

python

r

rpy2

I have some .rda files that I need to access with Python. My code looks like this:

import rpy2.robjects as robjects
from rpy2.robjects import r, pandas2ri

pandas2ri.activate()
df = robjects.r.load("datafile.rda")
df2 = pandas2ri.ri2py_dataframe(df)

where df2 is a pandas dataframe. However, it only contains the header of the .rda file! I have searched back and forth. None of the solutions proposed seem to be working.

Does anyone have an idea how to efficiently convert an .rda dataframe to a pandas dataframe?

like image 823
Matina G Avatar asked Dec 15 '17 13:12

Matina G


2 Answers

Thank you for your useful question. I tried the two ways proposed above to handle my problem. For feather, I faced this issue:

pyarrow.lib.ArrowInvalid: Not a Feather V1 or Arrow IPC file

For rpy2, as mentioned by @Orange: "pandas2ri.ri2py_dataframe does not seem to exist any longer in rpy2 version 3.0.3" or later.

I searched for another workaround and found pyreadr useful for me and maybe for those who are facing the same problems as I am: https://github.com/ofajardo/pyreadr

Usage: https://gist.github.com/LeiG/8094753a6cc7907c716f#gistcomment-2795790

pip install pyreadr
import pyreadr

result = pyreadr.read_r('/path/to/file.RData') # also works for Rds, rda

# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1
like image 150
Hoa Nguyen Avatar answered Nov 05 '22 21:11

Hoa Nguyen


You could try using the new feather library developed as a language agnostic dataframe to be used in either R or Python.

# Install feather
devtools::install_github("wesm/feather/R")

library(feather)
path <- "your_file_path"
write_feather(datafile, path)

Then install in python

$ pip install feather-format

And load in your datafile

import feather
path = 'your_file_path'
datafile = feather.read_dataframe(path)
like image 5
dshkol Avatar answered Nov 05 '22 20:11

dshkol