Some R datasets can be loaded into a Pandas DataFrame or Panel quite easily:
import pandas.rpy.common as com
infert = com.load_data('infert')
print(infert.head())
This appears to work as long as the dimension of the R dataset is <= 3. Higher dimensional datasets print an error message:
In [67]: com.load_data('Titanic')
Cannot handle dim=4
This error message originates in the rpy/common.py _convert_array
function.
Sure, it makes sense that Pandas can not directly shoe-horn a 4-dimensional matrix into a DataFrame or Panel, but is there some workaround to load datasets like Titanic
into a DataFrame (maybe with a hierarchical index)?
Using @joran's very helpful suggestion, after installing the reshape
package with
% sudo R
R> install.packages('reshape')
I managed to load the Titanic
dataset into a Pandas DataFrame with:
import pandas as pd
import pandas.rpy.common as com
import rpy2.robjects as ro
r = ro.r
r('library(reshape)')
df = com.convert_robj(r('melt(Titanic)'))
print(df.head())
which printed
Class Sex Age Survived value
1 1st Male Child No 0
2 2nd Male Child No 0
3 3rd Male Child No 35
4 Crew Male Child No 0
5 1st Female Child No 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With