I am trying to package some of my Python code that calls R code using rpy2. That R code currently sits in a separate file which I source
from the Python script. For example, if the python script is myscript.py
, then the R code is stored in myscript_support.R
, and I have something like the following in myscript.py
:
from rpy2.robjects import *
# Load the R code
r.source(os.path.join(os.path.dirname(__file__), "myscript_support.R"))
# Call the R function
r[["myscript_R_function"]]()
I now want to package this Python script using setuptools, and I have a few questions:
How should I package the R support code, and once I have done so, how do I find the path to the R file so I can source it?
The R code depends on several R packages. How can I ensure that these are installed? Should I just raise an informative error if these R packages cannot be loaded?
Installing packages Downloading and installing R packages is usually performed by fetching R packages from a package repository and installing them locally. Capabilities to do this are provided by R libraries, and when in Python we can simply use them using rpy2. An interface to the R features is provided in rpy2.
1 Answer. Show activity on this post. Typically non-R code goes in ./inst/python/your_script.py (likewise for JS, etc). Anything in the inst folder will be installed into your package's root directory unchanged.
Follow the below steps to create a package in PythonCreate a directory and include a __init__.py file in it to tell Python that the current directory is a package. Include other sub-packages or files you want. Next, access them with the valid import statements.
This question might be dated, but I ran into the same issue today and wanted to provide more detail for the question 1 solution suggested by @ivan_pozdeev and a new solution for question 2.
1) Edit your setup.py file to:
from setuptools import setup, find_packages
setup(
...
# If any package contains *.r files, include them:
package_data={'': ['*.r', '*.R']},
include_package_data=True)
)
2) Conda is quickly becoming a good option for dealing with package dependencies across both python and R. You can create an environment (http://conda.pydata.org/docs/using/envs), download all the r and python packages that you might need, and then generate an environment.yml file so that anyone can replicate your environment. Check out this blog for more info: https://www.continuum.io/content/conda-data-science
Well, imagine yourself as the setuptools packager and think of what you would expect the programmer to do.
For the first problem, you have two choices:
The first option is implementable by passing include_package_data = True
to setup()
and providing masks of files to include in package_data
(setuptools docs, "Including Data Files" section). Paths relative to packages' directories can be used. The files will be accessible at run time at the same relative paths through the "Resource Management API" ("Accessing Data Files at Runtime" section).
The second option would require you to add your code to setuptools before invoking setup()
. For example, you may add a file finder to add relevant .R files to the results of find_packages()
. Or just generate the list of files for the previous paragraph by arbitrary means.
For the second problem, the easiest way is to force setuptools to install the package as a directory rather than an .egg by specifying zip_safe = False
.
You might use eager_resources
option instead that extracts a group of resources on demand ("Automatic Resource Extraction" section).
As for installing third-party R packages, an automatable technique is described at R Installation and Administration - Installing packages
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With