Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Including and distributing third party libraries with a Python C extension

I'm building a C Python extension which makes use of a "third party" library— in this case, one that I've built using a separate build process and toolchain. Call this library libplumbus.dylib.

Directory structure would be:

grumbo/
  include/
    plumbus.h
  lib/
    libplumbus.so
  grumbo.c
  setup.py

My setup.py looks approximately like:

from setuptools import Extension, setup

native_module = Extension(
    'grumbo',
    define_macros = [('MAJOR_VERSION', '1'),
                     ('MINOR_VERSION', '0')],
    sources       = ['grumbo.c'],
    include_dirs  = ['include'],
    libraries     = ['plumbus'],
    library_dirs  = ['lib'])


setup(
    name = 'grumbo',
    version = '1.0',
    ext_modules = [native_module] )

Since libplumbus is an external library, when I run import grumbo I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dlopen(/path/to/grumbo/grumbo.cpython-37m-darwin.so, 2): Library not loaded: lib/libplumbus.dylib
  Referenced from: /path/to/grumbo/grumbo.cpython-37m-darwin.so
  Reason: image not found

What's the simplest way to set things up so that libplumbus is included with the distribution and properly loaded when grumbo is imported? (Note that this should work with a virtualenv).

I have tried adding lib/libplumbus.dylib to package_data, but this doesn't work, even if I add -Wl,-rpath,@loader_path/grumbo/lib to the Extension's extra_link_args.

like image 832
trbabb Avatar asked Sep 09 '20 04:09

trbabb


People also ask

What is C extension in Python?

Any code that you write using any compiled language like C, C++, or Java can be integrated or imported into another Python script. This code is considered as an "extension." A Python extension module is nothing more than a normal C library. On Unix machines, these libraries usually end in . so (for shared object).

Can you write a Python library in C?

To write Python modules in C, you'll need to use the Python API, which defines the various functions, macros, and variables that allow the Python interpreter to call your C code. All of these tools and more are collectively bundled in the Python. h header file.

What is Python third party libraries?

A third party module is any code that has been written by a third party (neither you nor the python writers (PSF)). You can use them to add functionality to your code without having to write it yourself.


1 Answers

The goal of this post is to have a setup.py which would create a source distribution. That means after running

python setup.py sdist

the resulting dist/grumbo-1.0.tar.gz could be used for installation via

pip install grumbo-1.0.tar.gz

We will start for a setup.py for Linux/MacOS, but then tweak to make it work for Windows as well.


The first step is to get the additional data (includes/library) into the distribution. I'm not sure it is really impossible to add data for a module, but setuptools offers functionality to add data for packages, so let's make a package from your module (which is probably a good idea anyway).

The new structure of package grumbo looks as follows:

src/
  grumbo/
     __init__.py  # empty
     grumbo.c
     include/
       plumbus.h
     lib/
       libplumbus.so
setup.py

and changed setup.py:

from setuptools import setup, Extension, find_packages

native_module = Extension(
                name='grumbo.grumbo',
                sources = ["src/grumbo/grumbo.c"],
              )
kwargs = {
      'name' : 'grumbo',
      'version' : '1.0',
      'ext_modules' :  [native_module],
      'packages':find_packages(where='src'),
      'package_dir':{"": "src"},
}

setup(**kwargs)

It doesn't do much yet, but at least our package can be found by setuptools. The build fails, because the includes are missing.

Now let's add the needed includes from the include-folder to the distribution via package-data:

...
kwargs = {
      ...,
      'package_data' : { 'grumbo': ['include/*.h']},
}
...

With that our include-files are copied to the source distribution. However because it will be build "somewhere" we don't know yet, adding include_dirs = ['include'] to the Extension definition just doesn't cut it.

There must be a better way (and less brittle) to find the right include path, but that is what I came up with:

...
import os
import sys
import sysconfig
def path_to_build_folder():
    """Returns the name of a distutils build directory"""
    f = "{dirname}.{platform}-{version[0]}.{version[1]}"
    dir_name = f.format(dirname='lib',
                    platform=sysconfig.get_platform(),
                    version=sys.version_info)
    return os.path.join('build', dir_name, 'grumbo')

native_module = Extension(
                ...,
                include_dirs  = [os.path.join(path_to_build_folder(),'include')],
)
...

Now, the extension is built, but cannot be yet loaded because it is not linked against shared-object libplumbus.so and thus some symbols are unresolved.

Similar to the header files, we can add our library to the distribution:

kwargs = {
          ...,
          'package_data' : { 'grumbo': ['include/*.h', 'lib/*.so']},
}
...

and add the right lib-path for the linker:

...
native_module = Extension(
                ...
                libraries     = ['plumbus'],
                library_dirs  = [os.path.join(path_to_build_folder(), 'lib')],
              )
...

Now, we are almost there:

  • the extension is built an put into site-packages/grumbo/
  • the extension depends on libplumbus.so as can be seen with help of ldd
  • libplumbus.so is put into site-packages/grumbo/lib

However, we still cannot import the extension, as import grumbo.grumbo leads to

ImportError: libplumbus.so: cannot open shared object file: No such file or directory

because the loader cannot find the needed shared object which resides in the folder .\lib relative to our extension. We could use rpath to "help" the loader:

...
native_module = Extension(
                ...
                extra_link_args = ["-Wl,-rpath=$ORIGIN/lib/."],
              )
...

And now we are done:

>>> import grumbo.grumbo
# works!

Also building and installing a wheel should work:

python setup.py bdist_wheel

and then:

pip install grumbo-1.0-xxxx.whl

The first mile stone is achieved. Now we extend it, so it works other platforms as well.


Same source distribution for Linux and Macos:

To be able to install the same source distribution on Linux and MacOS, both versions of the shared library (for Linux and MacOS) must be present. An option is to add a suffix to the names of shared objects: e.g. having libplumbus.linux.so and libplumbis.macos.so. The right shared object can be picked in the setup.py depending on the platform:

...
import platform
def pick_library():
    my_system = platform.system()
    if my_system == 'Linux':
        return "plumbus.linux"
    if my_system == 'Darwin':
        return "plumbus.macos"
    if my_system == 'Windows':
        return "plumbus"
    raise ValueError("Unknown platform: " + my_system)

native_module = Extension(
                ...
                libraries     = [pick_library()],
                ...
              )

Tweaking for Windows:

On Windows, dynamic libraries are dlls and not shared objects, so there are some differences that need to be taken into account:

  • when the C-extension is built, it needs plumbus.lib-file, which we need to put into the lib-subfolder.
  • when the C-extension is loaded during the run time, it needs plumbus.dll-file.
  • Windows has no notion of rpath, thus we need to put the dll right next to the extension, so it can be found (see also this SO-post for more details).

That means the folder structure should be as follows:

src/
  grumbo/
     __init__.py
     grumbo.c
     plumbus.dll           # needed for Windows
     include/
       plumbus.h
     lib/
       libplumbus.linux.so # needed on Linux
       libplumbus.macos.so # needed on Macos
       plumbus.lib         # needed on Windows
setup.py

There are also some changes in the setup.py. First, extending the package_data so dll and lib are picked up:

...
kwargs = {
      ...
      'package_data' : { 'grumbo': ['include/*.h', 'lib/*.so',
                                    'lib/*.lib', '*.dll',      # for windows
                                   ]},
}
...

Second, rpath can only be used on Linux/MacOS, thus:

def get_extra_link_args():
    if platform.system() == 'Windows':
        return []
    else:
        return ["-Wl,-rpath=$ORIGIN/lib/."]
    

native_module = Extension(
                ...
                extra_link_args = get_extra_link_args(),
              )

That it!


The complete setup file (you might want to add macro-definition or similar, which I've skipped):

from setuptools import setup, Extension, find_packages

import os
import sys
import sysconfig
def path_to_build_folder():
    """Returns the name of a distutils build directory"""
    f = "{dirname}.{platform}-{version[0]}.{version[1]}"
    dir_name = f.format(dirname='lib',
                    platform=sysconfig.get_platform(),
                    version=sys.version_info)
    return os.path.join('build', dir_name, 'grumbo')


import platform
def pick_library():
    my_system = platform.system()
    if my_system == 'Linux':
        return "plumbus.linux"
    if my_system == 'Darwin':
        return "plumbus.macos"
    if my_system == 'Windows':
        return "plumbus"
    raise ValueError("Unknown platform: " + my_system)


def get_extra_link_args():
    if platform.system() == 'Windows':
        return []
    else:
        return ["-Wl,-rpath=$ORIGIN/lib/."]
    

native_module = Extension(
                name='grumbo.grumbo',
                sources = ["src/grumbo/grumbo.c"],
                include_dirs  = [os.path.join(path_to_build_folder(),'include')],
                libraries     = [pick_library()],
                library_dirs  = [os.path.join(path_to_build_folder(), 'lib')],
                extra_link_args = get_extra_link_args(),
              )
kwargs = {
      'name' : 'grumbo',
      'version' : '1.0',
      'ext_modules' :  [native_module],
      'packages':find_packages(where='src'),
      'package_dir':{"": "src"},
      'package_data' : { 'grumbo': ['include/*.h', 'lib/*.so',
                                    'lib/*.lib', '*.dll',      # for windows
                                   ]},
}

setup(**kwargs)
like image 115
ead Avatar answered Nov 01 '22 16:11

ead