I never used rpy2 before, but I am just wondering if I could use it to save a python object (a pandas DataFrame) in an R-readable file. I am having trouble to move objects between these environments mainly because I'm using Windows and the data source is an Excel file. Yes, the kind that has cells with text including inverted commas, newlines, and all the stuff that CSV can't handle adequately.
I usually rely on XLConnectJars, but it seems to be broken
Installing package(s) into ‘C:/Program Files/R/library’
(as ‘lib’ is unspecified)
trying URL 'http://cran.csiro.au/bin/windows/contrib/2.15/XLConnectJars_0.2-4.zip'
Content type 'application/zip' length 16538311 bytes (15.8 Mb)
opened URL
downloaded 15.3 Mb
Warning in install.packages :
downloaded length 16011264 != reported length 16538311
pandas reads it properly, but I need to use the information in R.
Here is how you write/read .RData
files with rpy2
(since accepted solution is deprecated and doesn't show how to save to .RData
file):
import rpy2
from rpy2 import robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
# read .RData file as a pandas dataframe
def load_rdata_file(filename):
r_data = robjects.r['get'](robjects.r['load'](filename))
df = pandas2ri.ri2py(r_data)
return df
# write pandas dataframe to an .RData file
def save_rdata_file(df, filename):
r_data = pandas2ri.py2ri(df)
robjects.r.assign("my_df", r_data)
robjects.r("save(my_df, file='{}')".format(filename))
You can use rpy2 to do this. Once you have the data in a panda, you have to transmit it to R. This link provides an experimental interface between Python Pandas and R data.frames. A code example copied from the website:
from pandas import DataFrame
import pandas.rpy.common as com
df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
index=["one", "two", "three"])
r_dataframe = com.convert_to_r_dataframe(df)
print type(r_dataframe)
<class 'rpy2.robjects.vectors.DataFrame'>
print r_dataframe
A B C
one 1 4 7
two 2 5 8
three 3 6 9
Using the most recent version of rpy2
, version 3.3.2, I was unable to get the other answers to work. It appears that conversion works a bit differently now.
import pandas
p_df = pandas.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
The following code will convert the above pandas dataframe to an R dataframe and save the R dataframe as an R .rds
file.
from rpy2 import robjects
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
# Convert pandas dataframe to R dataframe
with localconverter(robjects.default_converter + pandas2ri.converter):
r_df = robjects.conversion.py2rpy(p_df)
# Save R dataframe as .rds file
r_file = "file.rds"
robjects.r.assign("my_df_tosave", r_df)
robjects.r(f"saveRDS(my_df_tosave, file='{r_file}')")
The following code will load the .rds
file and convert it back to a pandas dataframe.
# Load as R dataframe from .rds file
r_file = "file.rds"
robjects.r(f"df_to_load <- readRDS('{r_file}')")
r_df = robjects.r["df_to_load"]
# Convert R dataframe to pandas dataframe
with localconverter(robjects.default_converter + pandas2ri.converter):
p_df = robjects.conversion.rpy2py(r_df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With