Is there a way to access R data frame column names in python/rpy2?

Tags:

I have an R data frame, saved in Database02.Rda. Loading it

import rpy2.robjects as robjects
robjects.r.load("Database02.Rda")

works fine. However:

print(robjects.r.names("df"))

yields

NULL

Also, as an example, column 214 (213 if we count starting with 0) is named REGION.

print(robjects.r.table(robjects.r["df"][213]))

works fine:

Region 1   Region 2   ...
    9811       3451   ...

but we should also be able to do

print(robjects.r.table("df$REGION"))

This, however, results in

df$REGION 
        1

(which it does also for column names that do not exist at all); also:

print(robjects.r.table(robjects.r["df"]["REGION"]))

gives an error:

TypeError: SexpVector indices must be integers, not str

Now, the docs say, names can not be used for subsetting in python. Am I correct to assume that the column names are not imported whith the rest of the data when loading the data frame with python/rpy2? Am I thus correct that the easiest way to access them is to save and load them as a seperate list and construct a dict or so in python mapping the names to the column index numbers? This does not seem very generic, however. Is there a way to extract the column names directly?

The versions of R, python, rpy2 I use are: R: 3.2.2 python: 3.5.0 rpy2: 2.7.8

877

asked Mar 03 '16 19:03

0range

2 Answers

When doing the following, you are loading whatever objects are Database02.Rda into R's "global environment".

import rpy2.robjects as robjects
robjects.r.load("Database02.Rda")

robjects.globalenv is an Environement. You can list its content with:

tuple(robjects.globalenv.keys())

Now I am understanding that one of your objects is called df. You can access it with:

df = robjects.globalenv['df']

if df is a list or a data frame, you can access its named elements with rx2 (the doc is your friend here again). To get the one called REGION, do:

df.rx2("REGION")

To list all named elements in a list or dataframe that's easy:

tuple(df.names)

150

answered Nov 14 '22 22:11

lgautier

If you run R code in python, the global environment answer will not work. But kudos to @lgautier the creator/maintainer of this package. In R the dollar sign $ is used frequently. This is what I learned:

print(pamk_clusters$pamobject$clusinfo)

will not work, and its equivalent

print(pamk_clusters[["pamobject"]][["clusinfo"]])

also will not work ... however, after some digging in the "man"

http://rpy2.readthedocs.io/en/version_2.7.x/vector.html#extracting-r-style

Access to R-style extracting/subsetting is granted though the two delegators rx and rx2, representing the R functions [ and [[ respectively.

This works as expected

print(pamk_clusters.rx2("pamobject").rx2("clusinfo"))

I commented in the forums about "man" clarity:

https://bitbucket.org/rpy2/rpy2/issues/436/acessing-dataframe-elements-using-rpy2

I am using rpy2 on Win7 with ipython. To help others dig through the formatting, here is a setup that seems to work:

import rpy2
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr

base = importr('base')
utils = importr('utils')
utils.chooseCRANmirror(ind=1)

cluster = importr('cluster')
stats = importr('stats')
#utils.install_packages("fpc")
fpc = importr('fpc')

import pickle
with open ('points', 'rb') as fp:
    points = pickle.load(fp) 
# data above is stored as binary object
# online:  http://www.mshaffer.com/arizona/dissertation/points

import rpy2.robjects.numpy2ri as npr   
npr.activate()

k = robjects.IntVector(range(3, 8))   # r-syntax  3:7   # I expect 5
pamk_clusters = fpc.pamk(points,k)

print( base.summary(pamk_clusters) )
base.print( base.summary(pamk_clusters) )

utils.str(pamk_clusters)

print(pamk_clusters$pamobject$clusinfo)
base.print(pamk_clusters$pamobject$clusinfo)

print(pamk_clusters[["pamobject"]][["clusinfo"]])
print(pamk_clusters.rx2("pamobject").rx2("clusinfo"))

pam_clusters = cluster.pam(points,5)        # much slower
kmeans_clusters = stats.kmeans(points,5)    # much faster

utils.str(kmeans_clusters)

print(kmeans_clusters.rx2("cluster"))

R has been a standard for statistical computing for nearly 25 years, based on a forty-year old S - back when computing efficiency mattered a lot. https://en.wikipedia.org/wiki/R_(programming_language)

Again @lgautier, thank you for making R more readily accessible within Python

answered Nov 14 '22 20:11

mshaffer

Related questions
                            
                                When is getattr() not like normal attribute access? [duplicate]
                            
                                Find files in a directory containing desired string in Python
                            
                                Is it possible to use functions defined in the shell from python?
                            
                                How to set PYTHONPATH differently for version 2 and 3?
                            
                                Django /manage.py runserver doesn't work (Windows)
                            
                                how to delete kafka message after reading
                            
                                TensorFlow on Jupyter: Can't restore variables
                            
                                Mask 2D numpy array
                            
                                How to set default python version for pip?
                            
                                Django file upload with FTP backend
                            
                                How can I give row and column names to Scipy's csr_matrix?
                            
                                Cannot append items to multiprocessing shared list
                            
                                Python calculate lots of distances quickly
                            
                                XGBClassifier num_class is invalid
                            
                                Kernel error with Anaconda (Python 2.7) for Windows 10. Spyder IDE console error
                            
                                How to combine class design and matrix math efficiently?
                            
                                Empty string instead of unmatched group error
                            
                                numpy 'module' object has no attribute 'stack'
                            
                                partial_fit Sklearn's MLPClassifier
                            
                                Get text contents of what has been printed python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a way to access R data frame column names in python/rpy2?

Tags:

python

r

rpy2