Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we shed some definitive light on how python packaging and import works?

I had my fair chance of getting through the python management of modules, and every time is a challenge: packaging is not what people do every day, and it becomes a burden to learn, and a burden to remember, even when you actually do it, since this happens normally once.

I would like to collect here the definitive overview of how import, package management and distribution works in python, so that this question becomes the definitive explanation for all the magic that happens under the hood. Although I understand the broad level of the question, these things are so intertwined that any focused answer will not solve the main problem: understand how all works, what is outdated, what is current, what are just alternatives for the same task, what are the quirks.

The list of keywords to refer to is the following, but this is just a sample out of the bunch. There's a lot more and you are welcome to add additional details.

  • PyPI
  • setuptools / Distribute
  • distutils
  • eggs
  • egg-link
  • pip
  • zipimport
  • site.py
  • site-packages
  • .pth files
  • virtualenv
  • handling of compiled modules in eggs (with and without installation via easy_install)
  • use of get_data()
  • pypm
  • bento
  • PEP 376
  • the cheese shop
  • eggsecutable

Linking to other answers is probably a good idea. As I said, this question is for the high-level overview.

like image 477
Stefano Borini Avatar asked Apr 19 '11 10:04

Stefano Borini


People also ask

How does Python packaging work?

Python's native packaging is mostly built for distributing reusable code, called libraries, between developers. You can piggyback tools, or basic applications for developers, on top of Python's library packaging, using technologies like setuptools entry_points. Libraries are building blocks, not complete applications.

How do you do packaging in Python?

Follow the below steps to create a package in PythonCreate a directory and include a __init__.py file in it to tell Python that the current directory is a package. Include other sub-packages or files you want. Next, access them with the valid import statements.

How packages are imported in Python?

We can import modules from packages using the dot (.) operator. Now, if this module contains a function named select_difficulty() , we must use the full name to reference it. Now we can directly call this function.

Can we use package in Python?

Use of packages helps importing any modules, individually or whole. While importing a package or sub packages or modules, Python searches the whole tree of directories looking for the particular package and proceeds systematically as programmed by the dot operator.


1 Answers

For the most part, this is an attempt to look at the packaging/distribution side, not the mechanics of import. Unfortunately, packaging is the place where Python provides way more than one way to do it. I'm just trying to get the ball rolling, hopefully others will help fill what I miss or point out mistakes.

First of all there's some messy terminology here. A directory containing an __init__.py file is a package. However, most of what we're talking about here are specific versions of packages published on PyPI, one of it's mirrors, or in a vendor specific package management system like Debian's Apt, Redhat's Yum, Fink, Macports, Homebrew, or ActiveState's pypm.

These published packages are what folks are trying to call "Distributions" going forward in an attempt to use "Package" only as the Python language construct. You can see some of that usage in PEP-376 PEP-376.

Now, your list of keywords relate to several different aspects of the Python Ecosystem:

Finding and publishing python distributions:

  • PyPI (aka the cheese shop)
  • PyPI Mirrors
  • Various package management tools / systems: apt, yum, fink, macports, homebrew
  • pypm (ActiveState's alternative to PyPI)

The above are all services that provide a place to publish Python distributions in various formats. Some, like PyPI mirrors and apt / yum repositories can be run on your local machine or within your companies network but folks typically use the official ones. Most, if not all provide a tool (or multiple tools in the case of PyPI) to help find and download distributions.

Libraries used to create and install distributions:

  • setuptools / Distribute
  • distutils

Distutils is the standard infrastructure on which Python packages are compiled and built into distributions. There's a ton of functionality in distutils but most folks just know:

from distutils.core import setup  setup(name='Distutils',       version='1.0',       description='Python Distribution Utilities',       author='Greg Ward',       author_email='[email protected]',       url='http://www.python.org/sigs/distutils-sig/',       packages=['distutils', 'distutils.command'],  ) 

And to some extent that's a most of what you need. With the prior 9 lines of code you have enough information to install a pure Python package and also the minimal metadata required to publish that package a distribution on PyPI.

Setuptools provides the hooks necessary to support the Egg format and all of it's features and foibles. Distribute is an alternative to Setuptools that adds some features while trying to be mostly backwards compatible. I believe Distribute is going to be included in Python 3 as the successor to Distutil's from distutils.core import setup.

Both Setuptools and Distribute provide a custom version of the distutils setup command that does useful things like support the Egg format.

Python Distribution Formats:

  • source
  • eggs

Distributions are typically provided either as source archives (tarball or zipfile). The standard way to install a source distribution is by downloading and uncompressing the archive and then running the setup.py file inside.

For example, the following will download, build, and install the Pygments syntax highlighting library:

curl -O -G http://pypi.python.org/packages/source/P/Pygments/Pygments-1.4.tar.gz tar -zxvf Pygments-1.4.tar.gz cd Pygments-1.4 python setup.py build sudo python setup.py install 

Alternatively you can download the Egg file and install it. Typically this is accomplished by using easy_install or pip:

sudo easy_install pygments or  sudo pip install pygments 

Eggs were inspired by Java's Jarfiles and they have quite a few features you should read about here

Python Package Formats:

  • uncompressed directories
  • zipimport (zip compressed directories)

A normal python package is just a directory containing an __init__.py file and an arbitrary number of additional modules or sub-packages. Python also has support for finding and loading source code within *.zip files as long as they are included on the PYTHONPATH (sys.path).

Installing Python Packages:

  • easy_install: the original egg installation tool, depends on setuptools
  • pip: currently the most popular way to install python packages. Similar to easy_install but more flexible and has some nice features like requirements files to help document dependencies and reproduce deployments.
  • pypm, apt, yum, fink, etc

Environment Management / Automated Deployment:

  • bento
  • buildout
  • virtualenv (and virtualenvwrapper)

The above tools are used to help automate and manage dependencies for a Python project. Basically they give you tools to describe what distributions your application requires and automate the installation of those specific versions of your dependencies.

Locations of Packages / Distributions:

  • site-packages
  • PYTHONPATH
  • the current working directory (depends on your OS and environment settings)

By default, installing a python distribution is going to drop it into the site-packages directory. That directory is usually something like /usr/lib/pythonX.Y/site-packages.

A simple programmatic way to find your site-packages directory:

from distuils import sysconfig print sysconfig.get_python_lib() 

Ways to modify your PYTHONPATH:

Python's import statement will only find packages that are located in one of the directories included in your PYTHONPATH.

You can inspect and change your path from within Python by accessing:

import sys print sys.path sys.path.append("/home/myname/lib") 

Besides that, you can set the PYTHONPATH environment variable like you would any other environment variable on your OS or you could use:

  • .pth files: *.pth files located in directories that are already on your PYTHONPATH are read and each line of the *.pth file is added to your PYTHONPATH. Basically any time you would copy a package into a directory on your PYTHONPATH you could instead create a mypackages.pth. Read more about *.pth files: site module
  • egg-link files: Internal structure of python eggs they are a cross platform alternative to symbolic links. Creating an egg link file is similar to creating a pth file.
  • site.py modifications

To add the above /home/myname/lib to site-packages with a *.pth file you'd create a *.pth file. The name of the file doesn't matter but you should still probably choose something sensible.

Let's create myname.pth:

# myname.pth /home/myname/lib 

That's it. Drop that into sysconfig.get_python_lib() on your system or any other directory in your PYTHONPATH and /home/myname/lib will be added to the path.

like image 92
stderr Avatar answered Sep 27 '22 16:09

stderr