Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use rpy2 to save a pandas dataframe to an .Rdata file?

Tags:

python

pandas

r

I never used rpy2 before, but I am just wondering if I could use it to save a python object (a pandas DataFrame) in an R-readable file. I am having trouble to move objects between these environments mainly because I'm using Windows and the data source is an Excel file. Yes, the kind that has cells with text including inverted commas, newlines, and all the stuff that CSV can't handle adequately.

I usually rely on XLConnectJars, but it seems to be broken

Installing package(s) into ‘C:/Program Files/R/library’
(as ‘lib’ is unspecified)
trying URL 'http://cran.csiro.au/bin/windows/contrib/2.15/XLConnectJars_0.2-4.zip'
Content type 'application/zip' length 16538311 bytes (15.8 Mb)
opened URL
downloaded 15.3 Mb

Warning in install.packages :
  downloaded length 16011264 != reported length 16538311

pandas reads it properly, but I need to use the information in R.

like image 958
dmvianna Avatar asked Feb 26 '13 05:02

dmvianna


3 Answers

Here is how you write/read .RData files with rpy2 (since accepted solution is deprecated and doesn't show how to save to .RData file):

import rpy2
from rpy2 import robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

# read .RData file as a pandas dataframe
def load_rdata_file(filename):
    r_data = robjects.r['get'](robjects.r['load'](filename))
    df = pandas2ri.ri2py(r_data)
    return df

# write pandas dataframe to an .RData file
def save_rdata_file(df, filename):
    r_data = pandas2ri.py2ri(df)
    robjects.r.assign("my_df", r_data)
    robjects.r("save(my_df, file='{}')".format(filename))
like image 131
anthonybell Avatar answered Oct 16 '22 18:10

anthonybell


You can use rpy2 to do this. Once you have the data in a panda, you have to transmit it to R. This link provides an experimental interface between Python Pandas and R data.frames. A code example copied from the website:

from pandas import DataFrame
import pandas.rpy.common as com

df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
                index=["one", "two", "three"])
r_dataframe = com.convert_to_r_dataframe(df)

print type(r_dataframe)
 <class 'rpy2.robjects.vectors.DataFrame'>

print r_dataframe
      A B C
one   1 4 7
two   2 5 8
three 3 6 9
like image 22
Paul Hiemstra Avatar answered Oct 16 '22 18:10

Paul Hiemstra


Using the most recent version of rpy2, version 3.3.2, I was unable to get the other answers to work. It appears that conversion works a bit differently now.

import pandas
p_df = pandas.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

The following code will convert the above pandas dataframe to an R dataframe and save the R dataframe as an R .rds file.

from rpy2 import robjects
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter

# Convert pandas dataframe to R dataframe
with localconverter(robjects.default_converter + pandas2ri.converter):
    r_df = robjects.conversion.py2rpy(p_df)

# Save R dataframe as .rds file
r_file = "file.rds"
robjects.r.assign("my_df_tosave", r_df)
robjects.r(f"saveRDS(my_df_tosave, file='{r_file}')")

The following code will load the .rds file and convert it back to a pandas dataframe.

# Load as R dataframe from .rds file
r_file = "file.rds"
robjects.r(f"df_to_load <- readRDS('{r_file}')") 
r_df = robjects.r["df_to_load"]

# Convert R dataframe to pandas dataframe
with localconverter(robjects.default_converter + pandas2ri.converter):
    p_df = robjects.conversion.rpy2py(r_df)
like image 3
Svaberg Avatar answered Oct 16 '22 19:10

Svaberg