Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Packaging supporting R code in a python module?

I am trying to package some of my Python code that calls R code using rpy2. That R code currently sits in a separate file which I source from the Python script. For example, if the python script is myscript.py, then the R code is stored in myscript_support.R, and I have something like the following in myscript.py:

from rpy2.robjects import *

# Load the R code
r.source(os.path.join(os.path.dirname(__file__), "myscript_support.R"))

# Call the R function
r[["myscript_R_function"]]()

I now want to package this Python script using setuptools, and I have a few questions:

  1. How should I package the R support code, and once I have done so, how do I find the path to the R file so I can source it?

  2. The R code depends on several R packages. How can I ensure that these are installed? Should I just raise an informative error if these R packages cannot be loaded?

like image 936
Ryan C. Thompson Avatar asked Apr 05 '11 18:04

Ryan C. Thompson


People also ask

Can I run an R package in Python?

Installing packages Downloading and installing R packages is usually performed by fetching R packages from a package repository and installing them locally. Capabilities to do this are provided by R libraries, and when in Python we can simply use them using rpy2. An interface to the R features is provided in rpy2.

Where do I put Python code in R package?

1 Answer. Show activity on this post. Typically non-R code goes in ./inst/python/your_script.py (likewise for JS, etc). Anything in the inst folder will be installed into your package's root directory unchanged.

How do you do packaging in Python?

Follow the below steps to create a package in PythonCreate a directory and include a __init__.py file in it to tell Python that the current directory is a package. Include other sub-packages or files you want. Next, access them with the valid import statements.


2 Answers

This question might be dated, but I ran into the same issue today and wanted to provide more detail for the question 1 solution suggested by @ivan_pozdeev and a new solution for question 2.

1) Edit your setup.py file to:

from setuptools import setup, find_packages

setup(
    ...
    # If any package contains *.r files, include them:
    package_data={'': ['*.r', '*.R']},
    include_package_data=True)
    )

2) Conda is quickly becoming a good option for dealing with package dependencies across both python and R. You can create an environment (http://conda.pydata.org/docs/using/envs), download all the r and python packages that you might need, and then generate an environment.yml file so that anyone can replicate your environment. Check out this blog for more info: https://www.continuum.io/content/conda-data-science

like image 143
jsignell Avatar answered Sep 20 '22 16:09

jsignell


Well, imagine yourself as the setuptools packager and think of what you would expect the programmer to do.

  • Setuptools knows nothing about R, its files' structure or that your code uses them somehow.
  • Your R interpreter knows nothing about importing files from Python .egg's

For the first problem, you have two choices:

  1. Tell setuptools to just include some additional files without bothering what they are
  2. Teach setuptools about R, how to determine what R files your program uses and how to track and include their dependencies

The first option is implementable by passing include_package_data = True to setup() and providing masks of files to include in package_data (setuptools docs, "Including Data Files" section). Paths relative to packages' directories can be used. The files will be accessible at run time at the same relative paths through the "Resource Management API" ("Accessing Data Files at Runtime" section).

The second option would require you to add your code to setuptools before invoking setup(). For example, you may add a file finder to add relevant .R files to the results of find_packages(). Or just generate the list of files for the previous paragraph by arbitrary means.

For the second problem, the easiest way is to force setuptools to install the package as a directory rather than an .egg by specifying zip_safe = False. You might use eager_resources option instead that extracts a group of resources on demand ("Automatic Resource Extraction" section).

As for installing third-party R packages, an automatable technique is described at R Installation and Administration - Installing packages

like image 21
ivan_pozdeev Avatar answered Sep 17 '22 16:09

ivan_pozdeev