I am writing a Python C extension on a very old Red Hat system. The system has zlib 1.2.3, which does not correctly support large files. Unfortunately, I can't just upgrade the system zlib to a newer version, since some of the packages poke into internal zlib structures and that breaks on newer zlib versions.
I would like to build my extension so that all the zlib calls (gzopen()
, gzseek()
etc.) are resolved to a custom zlib that I install in my user directory, without affecting the rest of the Python executable and other extensions.
I have tried statically linking in libz.a
by adding libz.a
to the gcc command line during linking, but it did not work (still cannot create large files using gzopen()
for example). I also tried passing -z origin -Wl,-rpath=/path/to/zlib -lz
to gcc, but that also did not work.
Since newer versions of zlib are still named zlib 1.x
, the soname
is the same, so I think symbol versioning would not work. Is there a way to do what I want to do?
I am on a 32-bit Linux system. Python version is 2.6, which is custom-built.
Edit:
I created a minimal example. I am using Cython (version 0.19.1).
File gztest.pyx
:
from libc.stdio cimport printf, fprintf, stderr
from libc.string cimport strerror
from libc.errno cimport errno
from libc.stdint cimport int64_t
cdef extern from "zlib.h":
ctypedef void *gzFile
ctypedef int64_t z_off_t
int gzclose(gzFile fp)
gzFile gzopen(char *path, char *mode)
int gzread(gzFile fp, void *buf, unsigned int n)
char *gzerror(gzFile fp, int *errnum)
cdef void print_error(void *gzfp):
cdef int errnum = 0
cdef const char *s = gzerror(gzfp, &errnum)
fprintf(stderr, "error (%d): %s (%d: %s)\n", errno, strerror(errno), errnum, s)
cdef class GzFile:
cdef gzFile fp
cdef char *path
def __init__(self, path, mode='rb'):
self.path = path
self.fp = gzopen(path, mode)
if self.fp == NULL:
raise IOError('%s: %s' % (path, strerror(errno)))
cdef int read(self, void *buf, unsigned int n):
cdef int r = gzread(self.fp, buf, n)
if r <= 0:
print_error(self.fp)
return r
cdef int close(self):
cdef int r = gzclose(self.fp)
return 0
def read_test():
cdef GzFile ifp = GzFile('foo.gz')
cdef char buf[8192]
cdef int i, j
cdef int n
errno = 0
for 0 <= i < 0x200:
for 0 <= j < 0x210:
n = ifp.read(buf, sizeof(buf))
if n <= 0:
break
if n <= 0:
break
printf('%lld\n', <long long>ifp.tell())
printf('%lld\n', <long long>ifp.tell())
ifp.close()
File setup.py
:
import sys
import os
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
if __name__ == '__main__':
if 'CUSTOM_GZ' in os.environ:
d = {
'include_dirs': ['/home/alok/zlib_lfs/include'],
'extra_objects': ['/home/alok/zlib_lfs/lib/libz.a'],
'extra_compile_args': ['-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb']
}
else:
d = {'libraries': ['z']}
ext = Extension('gztest', sources=['gztest.pyx'], **d)
setup(name='gztest', cmdclass={'build_ext': build_ext}, ext_modules=[ext])
My custom zlib
is in /home/alok/zlib_lfs
(zlib version 1.2.8):
$ ls ~/zlib_lfs/lib/
libz.a libz.so libz.so.1 libz.so.1.2.8 pkgconfig
To compile the module using this libz.a
:
$ CUSTOM_GZ=1 python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/alok/zlib_lfs/include -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb
gcc -shared build/temp.linux-x86_64-2.6/gztest.o /home/alok/zlib_lfs/lib/libz.a -L/opt/lib -lpython2.6 -o /home/alok/gztest.so
gcc
is being passed all the flags I want (adding full path to libz.a
, large file flags, etc.).
To build the extension without my custom zlib, I can compile without CUSTOM_GZ
defined:
$ python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o
gcc -shared build/temp.linux-x86_64-2.6/gztest.o -L/opt/lib -lz -lpython2.6 -o /home/alok/gztest.so
We can check the size of the gztest.so
files:
$ stat --format='%s %n' original/gztest.so custom/gztest.so
62398 original/gztest.so
627744 custom/gztest.so
So, the statically linked file is much larger, as expected.
I can now do:
>>> import gztest
>>> gztest.read_test()
and it will try to read foo.gz
in the current directory.
When I do that using non-statically linked gztest.so
, it works as expected until it tries to read more than 2 GB.
When I do that using statically linked gztest.so
, it dumps core:
$ python -c 'import gztest; gztest.read_test()'
error (2): No such file or directory (0: )
0
Segmentation fault (core dumped)
The error about No such file or directory
is misleading -- the file exists and is gzopen()
actually returns successfully. gzread()
fails though.
Here is the gdb
backtrace:
(gdb) bt
#0 0xf730eae4 in free () from /lib/libc.so.6
#1 0xf70725e2 in ?? () from /lib/libz.so.1
#2 0xf6ce9c70 in __pyx_f_6gztest_6GzFile_close (__pyx_v_self=0xf6f75278) at gztest.c:1140
#3 0xf6cea289 in __pyx_pf_6gztest_2read_test (__pyx_self=<optimized out>) at gztest.c:1526
#4 __pyx_pw_6gztest_3read_test (__pyx_self=0x0, unused=0x0) at gztest.c:1379
#5 0xf769910d in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:3690
#6 PyEval_EvalFrameEx (f=0x8115c64, throwflag=0) at Python/ceval.c:2389
#7 0xf769a3b4 in PyEval_EvalCodeEx (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#8 0xf769a433 in PyEval_EvalCode (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4) at Python/ceval.c:522
#9 0xf76bbe1a in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>, filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1335
#10 PyRun_StringFlags (str=0x80a24c0 "import gztest; gztest.read_test()\n", start=257, globals=0xf6ff81c4, locals=0xf6ff81c4, flags=0xffbf2888) at Python/pythonrun.c:1298
#11 0xf76bd003 in PyRun_SimpleStringFlags (command=0x80a24c0 "import gztest; gztest.read_test()\n", flags=0xffbf2888) at Python/pythonrun.c:957
#12 0xf76ca1b9 in Py_Main (argc=1, argv=0xffbf2954) at Modules/main.c:548
#13 0x080485b2 in main ()
One of the problems seems to be that the second line in the backtrace refers to libz.so.1
! If I do ldd gztest.so
, I get, among other lines:
libz.so.1 => /lib/libz.so.1 (0xf6f87000)
I am not sure why that is happening though.
Edit 2:
I ended up doing the following:
z_
prefix. zlib
's configure
script makes this very easy: just run ./configure --zprefix ...
.gzopen64()
instead of gzopen()
in my Cython code. This is because I wanted to make sure I am using the correct "underlying" symbol.z_off64_t
explicitly.zlib.a
into the shared library generated by Cython. I used '-Wl,--whole-archive /home/alok/zlib_lfs_z/lib/libz.a -Wl,--no-whole-archive'
while linking with gcc to achieve that. There might be other ways or this might not be needed but it seemed the simplest way to make sure the correct library gets used.With the above changes, large files work while the rest of the Python extension modules/processes work as before.
Any code that you write using any compiled language like C, C++, or Java can be integrated or imported into another Python script. This code is considered as an "extension." A Python extension module is nothing more than a normal C library. On Unix machines, these libraries usually end in . so (for shared object).
Most of the Python Libraries are written in the C programming language. The Python standard library consists of more than 200 core modules. All these work together to make Python a high-level programming language.
Shared is a Python package created to be the programmer's companion when it comes to storing application data, managing configuration files, caching data, and exchanging data with other programs. Although a lightweight package, Shared smoothly handles collections (dict, list, set), binary data, and SQL queries.
Looks like this is similar to the problem in another question, except I get the opposite behavior.
I downloaded a tarball of zlib-1.2.8
, ran ./configure
, then changed the following Makefile
variables...
CFLAGS=-O3 -fPIC -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64
SFLAGS=-O3 -fPIC -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64
...mostly to add the -fPIC
to libz.a
so I could link to it in a shared library.
I then added some printf()
statements in the gzlib.c
functions gzopen()
, gzopen64()
, and gz_open()
so I could easily tell if these were being called.
After building libz.a
and libz.so
, I created a really simple foo.c
...
#include "zlib-1.2.8/zlib.h"
void main()
{
gzFile foo = gzopen("foo.gz", "rb");
}
...and compiled both a foo
standalone binary, and a foo.so
shared library with...
gcc -fPIC -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -o foo.o -c foo.c
gcc -o foo foo.o zlib-1.2.8/libz.a
gcc -shared -o foo.so foo.o zlib-1.2.8/libz.a
Running foo
worked as expected, and printed...
gzopen64
gz_open
...but using the foo.so
in Python with...
import ctypes
foo = ctypes.CDLL('./foo.so')
foo.main()
...didn't print anything, so I guess it's using Python's libz.so
...
$ ldd `which python`
...
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5af2c68000)
...
...even though foo.so
doesn't use it...
$ ldd foo.so
linux-vdso.so.1 => (0x00007fff93600000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc8bfa98000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc8c0078000)
The only way I could get it to work was to open the custom libz.so
directly with...
import ctypes
libz = ctypes.CDLL('zlib-1.2.8/libz.so.1.2.8')
libz.gzopen64('foo.gz', 'rb')
...which printed out...
gzopen64
gz_open
Note that the translation from gzopen
to gzopen64
is done by the pre-processor, so I had to call gzopen64()
directly.
So that's one way to fix it, but a better way would probably be to recompile your custom Python 2.6 to either link to the static zlib-1.2.8/libz.a
, or disable zlibmodule.c
completely, then you'll have more flexibility in your linking options.
Update
Regarding _LARGEFILE_SOURCE
vs. _LARGEFILE64_SOURCE
: I only pointed that out because of this comment in zlib.h
...
/* provide 64-bit offset functions if _LARGEFILE64_SOURCE defined, and/or
* change the regular functions to 64 bits if _FILE_OFFSET_BITS is 64 (if
* both are true, the application gets the *64 functions, and the regular
* functions are changed to 64 bits) -- in case these are set on systems
* without large file support, _LFS64_LARGEFILE must also be true
*/
...the implication being that the gzopen64()
function won't be exposed if you don't define _LARGEFILE64_SOURCE
. I'm not sure if _LFS64_LARGEFILE
applies to your system or not.
I would recommend using ctypes
. Write your C library as a normal shared library and than use ctypes
to access it. You would need to write a bit more Python code to transfer the data from Python data structures into C ones. The big advantage is that you can isolate everything from the rest of the system. You can explicitly specify the *.so
file you would like to load. The Python C API is not needed. I have quite good experiences with ctypes
. This should be not too difficult for you since you seem proficient with C.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With