Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

issue converting python pandas DataFrame to R dataframe for use with rpy2

I am having trouble converting a pandas DataFrame in Python to an R object, for future use in R using rpy2.

The new pandas release 0.8.0 (released a few weeks ago) has a function to convert pandas DataFrames to R DataFrames. The problem is in converting the first column of my pandas DataFrame, which consists of python datetime objects (successively, in a time series). The conversion into an R dataframe returns an StrVector of the dates and times, rather than a vector of R datetime-type objects which I believe are called "POSIXct" objects.

I know the command to convert a string of the type returned to a POSIXct, using the command "as.POSIXct('yyyy-mm-dd hh:mm:ss')". Unfortunately I have not been able to figure out the way to convert all these strings in the StrVector to POSIXct using python and rpy2. The dates need to be in the POSIXct format to be used with the TTR library in R. Below is the relevant python code:

import pandas
from pandas import *
import pandas.rpy.common as com
import rpy2.robjects as robjects
r = robjects.r
r.library('TTR')        #library contains the function ADX, to be used later

dataframe = read_csv('file_name', parse_dates = [0], names  = ['Date','Col1','Col2','Col3']     #command makes 1st column into datetime.datetime object
r_dataframe = com.convert_to_r_dataframe(dataframe)

ADX = r['ADX']          #creating a name for an R function in python
adx = ADX(r_dataframe)    #will not work because the dates in r_dataframe are in a StrVector

Further I do not believe that the StrVector can be iterated through to convert each object to a POSIXct object individually, due to the definition of a StrVector. Maybe there is a way to cast a StrVector to a generic one?

Any help/insight into this matter is greatly appreciated. I am a novice programmer and have been working on this for a couple hours now to no avail.

Thank you!

like image 826
yayder9990 Avatar asked Jul 16 '12 20:07

yayder9990


2 Answers

The reason your ADX call fails is because it expects an xts or matrix-like object with 3 columns: High, Low, Close. Your object contains 4 columns. Drop the date column before passing r_dataframe to ADX and everything should work. You can then add the datetime column back to the ADX output.

Or, if you can set the row.names attribute of your R data.frame to the values of the Date column and then remove the Date column, you can convert your R data.frame to an xts object by calling as.xts(r.data.frame). Then you can pass that to ADX and convert the result back to a pandas DataFrame.

like image 61
Joshua Ulrich Avatar answered Oct 18 '22 23:10

Joshua Ulrich


dalejung on GitHub has done quite a bit of work recently in creating a tighter pandas-xts interface with rpy2, you might get in touch with him or join the PyData mailing list

like image 45
Wes McKinney Avatar answered Oct 18 '22 21:10

Wes McKinney