I have some .rda files that I need to access with Python. My code looks like this:
import rpy2.robjects as robjects
from rpy2.robjects import r, pandas2ri
pandas2ri.activate()
df = robjects.r.load("datafile.rda")
df2 = pandas2ri.ri2py_dataframe(df)
where df2 is a pandas dataframe. However, it only contains the header of the .rda
file! I have searched back and forth. None of the solutions proposed seem to be working.
Does anyone have an idea how to efficiently convert an .rda
dataframe to a pandas dataframe?
Thank you for your useful question. I tried the two ways proposed above to handle my problem.
For feather
, I faced this issue:
pyarrow.lib.ArrowInvalid: Not a Feather V1 or Arrow IPC file
For rpy2
, as mentioned by @Orange: "pandas2ri.ri2py_dataframe does not seem to exist any longer in rpy2 version 3.0.3" or later.
I searched for another workaround and found pyreadr
useful for me and maybe for those who are facing the same problems as I am: https://github.com/ofajardo/pyreadr
Usage: https://gist.github.com/LeiG/8094753a6cc7907c716f#gistcomment-2795790
pip install pyreadr
import pyreadr
result = pyreadr.read_r('/path/to/file.RData') # also works for Rds, rda
# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1
You could try using the new feather library developed as a language agnostic dataframe to be used in either R or Python.
# Install feather
devtools::install_github("wesm/feather/R")
library(feather)
path <- "your_file_path"
write_feather(datafile, path)
Then install in python
$ pip install feather-format
And load in your datafile
import feather
path = 'your_file_path'
datafile = feather.read_dataframe(path)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With