Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

setuptools: package data folder location

I use setuptools to distribute my python package. Now I need to distribute additional datafiles.

From what I've gathered fromt the setuptools documentation, I need to have my data files inside the package directory. However, I would rather have my datafiles inside a subdirectory in the root directory.

What I would like to avoid:

/ #root |- src/ |  |- mypackage/ |  |  |- data/ |  |  |  |- resource1 |  |  |  |- [...] |  |  |- __init__.py |  |  |- [...] |- setup.py 

What I would like to have instead:

/ #root |- data/ |  |- resource1 |  |- [...] |- src/ |  |- mypackage/ |  |  |- __init__.py |  |  |- [...] |- setup.py 

I just don't feel comfortable with having so many subdirectories, if it's not essential. I fail to find a reason, why I /have/ to put the files inside the package directory. It is also cumbersome to work with so many nested subdirectories IMHO. Or is there any good reason that would justify this restriction?

like image 296
phant0m Avatar asked Dec 23 '10 13:12

phant0m


People also ask

How do you include data in a Python package?

Place the files that you want to include in the package directory (in our case, the data has to reside in the roman/ directory). Add the field include_package_data=True in setup.py. Add the field package_data={'': [... patterns for files you want to include, relative to package dir...]} in setup.py .

What is setuptools package in Python?

Setuptools is a collection of enhancements to the Python distutils that allow developers to more easily build and distribute Python packages, especially ones that have dependencies on other packages. Packages built and distributed using setuptools look to the user like ordinary Python packages based on the distutils .

Where do I put py typed?

As mentioned before, You need to add the py. typed in the package folder of the module. You also need to add that file to the setup.py package_data - otherwise the file would not be part of the package when You deploy it.


2 Answers

Option 1: Install as package data

The main advantage of placing data files inside the root of your Python package is that it lets you avoid worrying about where the files will live on a user's system, which may be Windows, Mac, Linux, some mobile platform, or inside an Egg. You can always find the directory data relative to your Python package root, no matter where or how it is installed.

For example, if I have a project layout like so:

project/     foo/         __init__.py         data/             resource1/                 foo.txt 

You can add a function to __init__.py to locate an absolute path to a data file:

import os  _ROOT = os.path.abspath(os.path.dirname(__file__)) def get_data(path):     return os.path.join(_ROOT, 'data', path)  print get_data('resource1/foo.txt') 

Outputs:

/Users/pat/project/foo/data/resource1/foo.txt 

After the project is installed as an Egg the path to data will change, but the code doesn't need to change:

/Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt 

Option 2: Install to fixed location

The alternative would be to place your data outside the Python package and then either:

  1. Have the location of data passed in via a configuration file, command line arguments or
  2. Embed the location into your Python code.

This is far less desirable if you plan to distribute your project. If you really want to do this, you can install your data wherever you like on the target system by specifying the destination for each group of files by passing in a list of tuples:

from setuptools import setup setup(     ...     data_files=[         ('/var/data1', ['data/foo.txt']),         ('/var/data2', ['data/bar.txt'])         ]     ) 

Updated: Example of a shell function to recursively grep Python files:

atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; } atlas% grep_py ": \[" ./setup.py:9:    package_data={'foo': ['data/resource1/foo.txt']} 
like image 175
samplebias Avatar answered Sep 20 '22 08:09

samplebias


I Think I found a good compromise which will allow you to mantain the following structure:

/ #root |- data/ |  |- resource1 |  |- [...] |- src/ |  |- mypackage/ |  |  |- __init__.py |  |  |- [...] |- setup.py 

You should install data as package_data, to avoid the problems described in samplebias answer, but in order to mantain the file structure you should add to your setup.py:

try:     os.symlink('../../data', 'src/mypackage/data')     setup(         ...         package_data = {'mypackage': ['data/*']}         ...     ) finally:     os.unlink('src/mypackage/data') 

This way we create the appropriate structure "just in time", and mantain our source tree organized.

To access such data files within your code, you 'simply' use:

data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')

I still don't like having to specify 'mypackage' in the code, as the data could have nothing to do necessarally with this module, but i guess its a good compromise.

like image 28
polvoazul Avatar answered Sep 22 '22 08:09

polvoazul