Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python C extension link with a custom shared library

I am writing a Python C extension on a very old Red Hat system. The system has zlib 1.2.3, which does not correctly support large files. Unfortunately, I can't just upgrade the system zlib to a newer version, since some of the packages poke into internal zlib structures and that breaks on newer zlib versions.

I would like to build my extension so that all the zlib calls (gzopen(), gzseek() etc.) are resolved to a custom zlib that I install in my user directory, without affecting the rest of the Python executable and other extensions.

I have tried statically linking in libz.a by adding libz.a to the gcc command line during linking, but it did not work (still cannot create large files using gzopen() for example). I also tried passing -z origin -Wl,-rpath=/path/to/zlib -lz to gcc, but that also did not work.

Since newer versions of zlib are still named zlib 1.x, the soname is the same, so I think symbol versioning would not work. Is there a way to do what I want to do?

I am on a 32-bit Linux system. Python version is 2.6, which is custom-built.

Edit:

I created a minimal example. I am using Cython (version 0.19.1).

File gztest.pyx:

from libc.stdio cimport printf, fprintf, stderr
from libc.string cimport strerror
from libc.errno cimport errno
from libc.stdint cimport int64_t

cdef extern from "zlib.h":
    ctypedef void *gzFile
    ctypedef int64_t z_off_t

    int gzclose(gzFile fp)
    gzFile gzopen(char *path, char *mode)
    int gzread(gzFile fp, void *buf, unsigned int n)
    char *gzerror(gzFile fp, int *errnum)

cdef void print_error(void *gzfp):
    cdef int errnum = 0
    cdef const char *s = gzerror(gzfp, &errnum)
    fprintf(stderr, "error (%d): %s (%d: %s)\n", errno, strerror(errno), errnum, s)

cdef class GzFile:
    cdef gzFile fp
    cdef char *path
    def __init__(self, path, mode='rb'):
        self.path = path
        self.fp = gzopen(path, mode)
        if self.fp == NULL:
            raise IOError('%s: %s' % (path, strerror(errno)))

    cdef int read(self, void *buf, unsigned int n):
        cdef int r = gzread(self.fp, buf, n)
        if r <= 0:
            print_error(self.fp)
        return r

    cdef int close(self):
        cdef int r = gzclose(self.fp)
        return 0

def read_test():
    cdef GzFile ifp = GzFile('foo.gz')
    cdef char buf[8192]
    cdef int i, j
    cdef int n
    errno = 0
    for 0 <= i < 0x200:
        for 0 <= j < 0x210:
            n = ifp.read(buf, sizeof(buf))
            if n <= 0:
                break

        if n <= 0:
            break

        printf('%lld\n', <long long>ifp.tell())

    printf('%lld\n', <long long>ifp.tell())
    ifp.close()

File setup.py:

import sys
import os

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

if __name__ == '__main__':
    if 'CUSTOM_GZ' in os.environ:
        d = {
            'include_dirs': ['/home/alok/zlib_lfs/include'],
            'extra_objects': ['/home/alok/zlib_lfs/lib/libz.a'],
            'extra_compile_args': ['-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb']
        }
    else:
        d = {'libraries': ['z']}
    ext = Extension('gztest', sources=['gztest.pyx'], **d)
    setup(name='gztest', cmdclass={'build_ext': build_ext}, ext_modules=[ext])

My custom zlib is in /home/alok/zlib_lfs (zlib version 1.2.8):

$ ls ~/zlib_lfs/lib/
libz.a  libz.so  libz.so.1  libz.so.1.2.8  pkgconfig

To compile the module using this libz.a:

$ CUSTOM_GZ=1 python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/alok/zlib_lfs/include -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb
gcc -shared build/temp.linux-x86_64-2.6/gztest.o /home/alok/zlib_lfs/lib/libz.a -L/opt/lib -lpython2.6 -o /home/alok/gztest.so

gcc is being passed all the flags I want (adding full path to libz.a, large file flags, etc.).

To build the extension without my custom zlib, I can compile without CUSTOM_GZ defined:

$ python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o
gcc -shared build/temp.linux-x86_64-2.6/gztest.o -L/opt/lib -lz -lpython2.6 -o /home/alok/gztest.so

We can check the size of the gztest.so files:

$ stat --format='%s %n' original/gztest.so custom/gztest.so 
62398 original/gztest.so
627744 custom/gztest.so

So, the statically linked file is much larger, as expected.

I can now do:

>>> import gztest
>>> gztest.read_test()

and it will try to read foo.gz in the current directory.

When I do that using non-statically linked gztest.so, it works as expected until it tries to read more than 2 GB.

When I do that using statically linked gztest.so, it dumps core:

$ python -c 'import gztest; gztest.read_test()'
error (2): No such file or directory (0: )
0
Segmentation fault (core dumped)

The error about No such file or directory is misleading -- the file exists and is gzopen() actually returns successfully. gzread() fails though.

Here is the gdb backtrace:

(gdb) bt
#0  0xf730eae4 in free () from /lib/libc.so.6
#1  0xf70725e2 in ?? () from /lib/libz.so.1
#2  0xf6ce9c70 in __pyx_f_6gztest_6GzFile_close (__pyx_v_self=0xf6f75278) at gztest.c:1140
#3  0xf6cea289 in __pyx_pf_6gztest_2read_test (__pyx_self=<optimized out>) at gztest.c:1526
#4  __pyx_pw_6gztest_3read_test (__pyx_self=0x0, unused=0x0) at gztest.c:1379
#5  0xf769910d in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:3690
#6  PyEval_EvalFrameEx (f=0x8115c64, throwflag=0) at Python/ceval.c:2389
#7  0xf769a3b4 in PyEval_EvalCodeEx (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#8  0xf769a433 in PyEval_EvalCode (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4) at Python/ceval.c:522
#9  0xf76bbe1a in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>, filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1335
#10 PyRun_StringFlags (str=0x80a24c0 "import gztest; gztest.read_test()\n", start=257, globals=0xf6ff81c4, locals=0xf6ff81c4, flags=0xffbf2888) at Python/pythonrun.c:1298
#11 0xf76bd003 in PyRun_SimpleStringFlags (command=0x80a24c0 "import gztest; gztest.read_test()\n", flags=0xffbf2888) at Python/pythonrun.c:957
#12 0xf76ca1b9 in Py_Main (argc=1, argv=0xffbf2954) at Modules/main.c:548
#13 0x080485b2 in main ()

One of the problems seems to be that the second line in the backtrace refers to libz.so.1! If I do ldd gztest.so, I get, among other lines:

    libz.so.1 => /lib/libz.so.1 (0xf6f87000)

I am not sure why that is happening though.

Edit 2:

I ended up doing the following:

  • compiled my custom zlib with all the symbols exported with a z_ prefix. zlib's configure script makes this very easy: just run ./configure --zprefix ....
  • called gzopen64() instead of gzopen() in my Cython code. This is because I wanted to make sure I am using the correct "underlying" symbol.
  • used z_off64_t explicitly.
  • statically link my custom zlib.a into the shared library generated by Cython. I used '-Wl,--whole-archive /home/alok/zlib_lfs_z/lib/libz.a -Wl,--no-whole-archive' while linking with gcc to achieve that. There might be other ways or this might not be needed but it seemed the simplest way to make sure the correct library gets used.

With the above changes, large files work while the rest of the Python extension modules/processes work as before.

like image 589
Alok Singhal Avatar asked May 30 '13 16:05

Alok Singhal


People also ask

What is C extension in Python?

Any code that you write using any compiled language like C, C++, or Java can be integrated or imported into another Python script. This code is considered as an "extension." A Python extension module is nothing more than a normal C library. On Unix machines, these libraries usually end in . so (for shared object).

Are Python libraries written in C?

Most of the Python Libraries are written in the C programming language. The Python standard library consists of more than 200 core modules. All these work together to make Python a high-level programming language.

What is a shared Python library?

Shared is a Python package created to be the programmer's companion when it comes to storing application data, managing configuration files, caching data, and exchanging data with other programs. Although a lightweight package, Shared smoothly handles collections (dict, list, set), binary data, and SQL queries.


2 Answers

Looks like this is similar to the problem in another question, except I get the opposite behavior.

I downloaded a tarball of zlib-1.2.8, ran ./configure, then changed the following Makefile variables...

CFLAGS=-O3  -fPIC -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64

SFLAGS=-O3  -fPIC -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64

...mostly to add the -fPIC to libz.a so I could link to it in a shared library.

I then added some printf() statements in the gzlib.c functions gzopen(), gzopen64(), and gz_open() so I could easily tell if these were being called.

After building libz.a and libz.so, I created a really simple foo.c...

#include "zlib-1.2.8/zlib.h"

void main()
{
    gzFile foo = gzopen("foo.gz", "rb");
}

...and compiled both a foo standalone binary, and a foo.so shared library with...

gcc -fPIC -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -o foo.o -c foo.c
gcc -o foo foo.o zlib-1.2.8/libz.a
gcc -shared -o foo.so foo.o zlib-1.2.8/libz.a

Running foo worked as expected, and printed...

gzopen64
gz_open

...but using the foo.so in Python with...

import ctypes

foo = ctypes.CDLL('./foo.so')
foo.main()

...didn't print anything, so I guess it's using Python's libz.so...

$ ldd `which python`
        ...
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5af2c68000)
        ...

...even though foo.so doesn't use it...

$ ldd foo.so
        linux-vdso.so.1 =>  (0x00007fff93600000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc8bfa98000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc8c0078000)

The only way I could get it to work was to open the custom libz.so directly with...

import ctypes

libz = ctypes.CDLL('zlib-1.2.8/libz.so.1.2.8')
libz.gzopen64('foo.gz', 'rb')

...which printed out...

gzopen64
gz_open

Note that the translation from gzopen to gzopen64 is done by the pre-processor, so I had to call gzopen64() directly.

So that's one way to fix it, but a better way would probably be to recompile your custom Python 2.6 to either link to the static zlib-1.2.8/libz.a, or disable zlibmodule.c completely, then you'll have more flexibility in your linking options.


Update

Regarding _LARGEFILE_SOURCE vs. _LARGEFILE64_SOURCE: I only pointed that out because of this comment in zlib.h...

/* provide 64-bit offset functions if _LARGEFILE64_SOURCE defined, and/or
 * change the regular functions to 64 bits if _FILE_OFFSET_BITS is 64 (if
 * both are true, the application gets the *64 functions, and the regular
 * functions are changed to 64 bits) -- in case these are set on systems
 * without large file support, _LFS64_LARGEFILE must also be true
 */

...the implication being that the gzopen64() function won't be exposed if you don't define _LARGEFILE64_SOURCE. I'm not sure if _LFS64_LARGEFILE applies to your system or not.

like image 98
Aya Avatar answered Sep 27 '22 17:09

Aya


I would recommend using ctypes. Write your C library as a normal shared library and than use ctypes to access it. You would need to write a bit more Python code to transfer the data from Python data structures into C ones. The big advantage is that you can isolate everything from the rest of the system. You can explicitly specify the *.so file you would like to load. The Python C API is not needed. I have quite good experiences with ctypes. This should be not too difficult for you since you seem proficient with C.

like image 28
Mike Müller Avatar answered Sep 27 '22 16:09

Mike Müller