Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas to R dataframe

I am going to convert Python pandas dataframe to dataframe in R. I found out few libraries for this problem

http://pandas.pydata.org/pandas-docs/stable/r_interface.html

which is rpy2

But I couldn't find the methods for saving or transfer it to R.

Firstly I tried "to_csv"

df_R = com.convert_to_r_dataframe(df_total)
df_R.to_csv(direc+"/qap/detail_summary_R/"+"distance_"+str(gp_num)+".csv",sep = ",")

But it gives me an error

"AttributeError: 'DataFrame' object has no attribute 'to_csv'  "

So I tried to see its data type it was

<class 'rpy2.robjects.vectors.DataFrame'>

how could I save this type object to csv file or transfer to R?

like image 240
JonghoKim Avatar asked Jun 07 '14 06:06

JonghoKim


People also ask

Can I use pandas with R?

There are many data manipulation tasks done in R that can also be done using Pandas in python. In this article, we are going to discuss a comparison between data manipulation using R and Pandas based on some of the important functions and features.

Is pandas similar to Dplyr?

Learn More. Heey great post, but pandas has very similar functions as dplyr. If you use those instead, you get statements very similar to your dplyr statements and you would get the same readability.

Is there a Dplyr for Python?

Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.

Is pandas an R package?

The PANDA R package (Preferential Attachment based common Neighbor Distribution derived Associations) was designed to perform the following tasks: (1) identify significantly functionally associated protein pairs, (2) predict GO and KEGG terms for proteins, (3) make a cluster of proteins based on the significant protein ...


2 Answers

If standard text-based formats (csv) are too slow or bulky, I'd recommend feather, a serialization format built on Apache Arrow. It was explicitly developed by the creators of RStudio/ggplot2/etc (Hadley Wickham) and pandas (Wes McKinney) for performance and interoperability between Python and R (see here).

You need pandas verson 0.20.0+, pip install feather-format, then you can use the to_feather/read_feather operations as drop-in replacements for to_csv/read_csv:

df_R.to_feather('filename.feather')
df_R = pd.read_feather('filename.feather')

The R equivalents (using the package feather) are

df <- feather::read_feather('filename.feather')
feather::write_feather(df, 'filename.feather')

Besides some minor tweaks (e.g. you can't save custom DataFrame indexes in feather, so you'll need to call df.reset_index() first), this is a fast and easy drop-in replacement for csv, pickle, etc.

like image 122
jayelm Avatar answered Oct 04 '22 01:10

jayelm


The recent documentation https://rpy2.github.io/doc/v3.2.x/html/generated_rst/pandas.html has a section about interacting with pandas.

Otherwise objects of type rpy2.robjects.vectors.DataFrame have a method to_csvfile, not to_csv:

https://rpy2.github.io/doc/v3.2.x/html/vector.html#rpy2.robjects.vectors.DataFrame.to_csvfile

If wanting to pass data between Python and R, there are more efficient ways than writing and reading CSV files. Try the conversion system:

from rpy2.robjects import pandas2ri
pandas2ri.activate()

from rpy2.robjects.packages import importr

base = importr('base')
# call an R function on a Pandas DataFrame
base.summary(my_pandas_dataframe)
like image 36
lgautier Avatar answered Oct 04 '22 02:10

lgautier