I have an R data frame, saved in Database02.Rda. Loading it
import rpy2.robjects as robjects
robjects.r.load("Database02.Rda")
works fine. However:
print(robjects.r.names("df"))
yields
NULL
Also, as an example, column 214 (213 if we count starting with 0) is named REGION.
print(robjects.r.table(robjects.r["df"][213]))
works fine:
Region 1 Region 2 ...
9811 3451 ...
but we should also be able to do
print(robjects.r.table("df$REGION"))
This, however, results in
df$REGION
1
(which it does also for column names that do not exist at all); also:
print(robjects.r.table(robjects.r["df"]["REGION"]))
gives an error:
TypeError: SexpVector indices must be integers, not str
Now, the docs say, names can not be used for subsetting in python. Am I correct to assume that the column names are not imported whith the rest of the data when loading the data frame with python/rpy2? Am I thus correct that the easiest way to access them is to save and load them as a seperate list and construct a dict or so in python mapping the names to the column index numbers? This does not seem very generic, however. Is there a way to extract the column names directly?
The versions of R, python, rpy2 I use are: R: 3.2.2 python: 3.5.0 rpy2: 2.7.8
To access a specific column in a dataframe by name, you use the $ operator in the form df$name where df is the name of the dataframe, and name is the name of the column you are interested in. This operation will then return the column you want as a vector.
To find the column names and row names in an R data frame based on a condition, we can use row. names and colnames function.
colnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector.
Data in data frames can be addressed by index (subsetting), by logical vector, or by name (columns only). Use the $ operator to address a column by name.
When doing the following, you are loading whatever objects are Database02.Rda
into R's "global environment".
import rpy2.robjects as robjects
robjects.r.load("Database02.Rda")
robjects.globalenv
is an Environement. You can list its content with:
tuple(robjects.globalenv.keys())
Now I am understanding that one of your objects is called df
. You can access it with:
df = robjects.globalenv['df']
if df
is a list or a data frame, you can access its named elements with
rx2
(the doc is your friend here again). To get the one called REGION
, do:
df.rx2("REGION")
To list all named elements in a list or dataframe that's easy:
tuple(df.names)
If you run R
code in python, the global environment answer will not work. But kudos to @lgautier the creator/maintainer of this package. In R
the dollar sign $ is used frequently. This is what I learned:
print(pamk_clusters$pamobject$clusinfo)
will not work, and its equivalent
print(pamk_clusters[["pamobject"]][["clusinfo"]])
also will not work ... however, after some digging in the "man"
This works as expected
print(pamk_clusters.rx2("pamobject").rx2("clusinfo"))
I commented in the forums about "man" clarity:
https://bitbucket.org/rpy2/rpy2/issues/436/acessing-dataframe-elements-using-rpy2
I am using rpy2 on Win7 with ipython. To help others dig through the formatting, here is a setup that seems to work:
import rpy2
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
base = importr('base')
utils = importr('utils')
utils.chooseCRANmirror(ind=1)
cluster = importr('cluster')
stats = importr('stats')
#utils.install_packages("fpc")
fpc = importr('fpc')
import pickle
with open ('points', 'rb') as fp:
points = pickle.load(fp)
# data above is stored as binary object
# online: http://www.mshaffer.com/arizona/dissertation/points
import rpy2.robjects.numpy2ri as npr
npr.activate()
k = robjects.IntVector(range(3, 8)) # r-syntax 3:7 # I expect 5
pamk_clusters = fpc.pamk(points,k)
print( base.summary(pamk_clusters) )
base.print( base.summary(pamk_clusters) )
utils.str(pamk_clusters)
print(pamk_clusters$pamobject$clusinfo)
base.print(pamk_clusters$pamobject$clusinfo)
print(pamk_clusters[["pamobject"]][["clusinfo"]])
print(pamk_clusters.rx2("pamobject").rx2("clusinfo"))
pam_clusters = cluster.pam(points,5) # much slower
kmeans_clusters = stats.kmeans(points,5) # much faster
utils.str(kmeans_clusters)
print(kmeans_clusters.rx2("cluster"))
R
has been a standard for statistical computing for nearly 25 years, based on a forty-year old S
- back when computing efficiency mattered a lot.
https://en.wikipedia.org/wiki/R_(programming_language)
Again @lgautier, thank you for making R more readily accessible within Python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With