While exploring Cython compile steps, I found I need to link C libraries like math explicitly in setup.py. However, such step was not needed for numpy. Why so? Is numpy being imported through usual python import mechanism? If that is the case, we need not explicitly link any extension module in Cython? I tried to rummage through the official documentation, but unfortunately there was no explanation as to when an explicit linking is required and when it will be dealt automatically.

Call of a <code>cdef</code>-function corresponds more or less just to a jump to an address in the memory - the one from which the command should be read/executed. The question is how this address is provided. There are some cases we need to consider: A. inline functions The code of those functions is either inlined or the definition of the function is in the same translation unit, thus the address is known to the linker at the link time (or even compiler at compile-time) - no need for additional libraries. An example are header-only libraries. Consequences: Only include path(s) should be provided in <code>setup.py</code>. B. static linking The definition/functionality we need is in another translation unit/library - the target-address of the jump is calculated at the link-time and cannot be changed anymore afterwards. An example are additional c/cpp-files or static libraries which are added to extension-definition. Consequences: Static library should be added to <code>setup.py</code>, i.e. library-path and library name along with include paths. C. dynamic linking The necessary functionality is provided in a shared object/dll. The address to jump to is calculated during the runtime from loader and can be replaced at program start by exchanging the loaded shared objects. An example are stdlibc++ (usually added automatically by g++) or libm, which is not automatically added to linker command by gcc. Consequences: Dynamic library should be added to <code>setup.py</code>, i.e. library-path and library name, maybe r-path + include paths. Shared object/dll must be provided at the run time. More (than one probably would like to know) information about Cython/Python using dynamic libraries can be found in this SO-post. D. Calling via a pointer Linker is needed only when we call a function via its name. If we call it via a function-pointer, we don't need a linker/loader because the address of the function is already known - the value in the function pointer. Example: Cython-generated modules uses this machinery to enable access to its cdef-functions exported through <code>pxd</code>-file. It creates a data structure (which is stored as variable <code>__pyx_capi__</code> in the module itself) of function-pointers, which is filled by the loader once the so/dll is loaded via <code>ldopen</code> (or whatever Windows' equivalent). The lookup in the dictionary happens only once when the module is loaded and the addresses of functions are cached, so the calls during the run time have almost no overhead. We can inspect it, for example via <pre class="prettyprint"><code>#foo.pyx: cdef void doit(): print("doit") #foo.pxd cdef void doit() >>> cythonize -3 -i foo.pyx >>> python -c "import foo; print(foo.__pyx_capi__)" {'doit': <capsule object "void (void)" at 0x7f7b10bb16c0>} </code></pre> Now, calling a <code>cdef</code> function from another module is just jumping to the corresponding address. Consequences: We need to cimport the needed funcionality. <hr> Numpy is a little bit more complicated as it uses a sophisticated combination of A and D in order to postpone the resolution of symbols until the run time, thus not needing shared-object/dlls at link time (but at run time!). Some functionality in numpy-pxd file can be directly used because they are inlined (or even just defines), for example <code>PyArray_NDIM</code>, basically everything from <code>ndarraytypes.h</code>. This is the reason one can use cython's ndarrays without much ado. Other functionality (basically everything from <code>ndarrayobject.h</code>) cannot be accessed without calling <code>np.import_array()</code> in an initialization step, for example <code>PyArray_FromAny</code>. Why? The answer is in the header <code>__multiarray_api.h</code> which is included in <code>ndarrayobject.h</code>, but cannot be found in the git-repository as it is generated during the installation, where the definition of <code>PyArray_FromAny</code> can be looked up: <pre class="prettyprint lang-c prettyprint-override"><code>... static void **PyArray_API=NULL; //usually... ... #define PyArray_CheckFromAny \ (*(PyObject * (*)(PyObject *, PyArray_Descr *, int, int, int, PyObject *)) \ PyArray_API[108]) ... </code></pre> <code>PyArray_CheckFromAny</code> isn't a name of a function, but a define fo a function pointer saved in <code>PyArray_API</code>, which is not initialized (i.e. is <code>NULL</code>), when module is first loaded! Btw, there is also a (private) function called <code>PyArray_CheckFromAny</code>, which is what the function pointer actually points to - and because the public version is a define there is no name collision when linked... The last piece of the puzzle - the function <code>_import_array</code> (more or less the working horse behind <code>np.import_array</code>) is an inline function (case A), so only include path is needed, to be able to use it. <code>_import_array</code> uses a similar approach to Cython's <code>__pyx_capi__</code> to get the function pointers: The field is called <code>_ARRAY_API</code> and can be inspected via: <pre class="prettyprint"><code>>>> import numpy.core._multiarray_umath as macore >>> macore._ARRAY_API <capsule object NULL at 0x7f17d85f3810> </code></pre> More info about how <code>PyArray_API</code> can be initialized can be found in this SO-answer of mine. However, when using functionality from <code>numpy/math.pxd</code>, one has to staticly link numpy's math-library (see for example this SO-question).

Loading vs linking in Cython modules

Tags:

python

cython

While exploring Cython compile steps, I found I need to link C libraries like math explicitly in setup.py. However, such step was not needed for numpy. Why so? Is numpy being imported through usual python import mechanism? If that is the case, we need not explicitly link any extension module in Cython?

I tried to rummage through the official documentation, but unfortunately there was no explanation as to when an explicit linking is required and when it will be dealt automatically.

263

asked Sep 29 '19 13:09

Avinash Tripathi

1 Answers

Call of a cdef-function corresponds more or less just to a jump to an address in the memory - the one from which the command should be read/executed. The question is how this address is provided. There are some cases we need to consider:

A. inline functions

The code of those functions is either inlined or the definition of the function is in the same translation unit, thus the address is known to the linker at the link time (or even compiler at compile-time) - no need for additional libraries.

An example are header-only libraries.

Consequences: Only include path(s) should be provided in setup.py.

B. static linking

The definition/functionality we need is in another translation unit/library - the target-address of the jump is calculated at the link-time and cannot be changed anymore afterwards.

An example are additional c/cpp-files or static libraries which are added to extension-definition.

Consequences: Static library should be added to setup.py, i.e. library-path and library name along with include paths.

C. dynamic linking

The necessary functionality is provided in a shared object/dll. The address to jump to is calculated during the runtime from loader and can be replaced at program start by exchanging the loaded shared objects.

An example are stdlibc++ (usually added automatically by g++) or libm, which is not automatically added to linker command by gcc.

Consequences: Dynamic library should be added to setup.py, i.e. library-path and library name, maybe r-path + include paths. Shared object/dll must be provided at the run time. More (than one probably would like to know) information about Cython/Python using dynamic libraries can be found in this SO-post.

D. Calling via a pointer

Linker is needed only when we call a function via its name. If we call it via a function-pointer, we don't need a linker/loader because the address of the function is already known - the value in the function pointer.

Example: Cython-generated modules uses this machinery to enable access to its cdef-functions exported through pxd-file. It creates a data structure (which is stored as variable __pyx_capi__ in the module itself) of function-pointers, which is filled by the loader once the so/dll is loaded via ldopen (or whatever Windows' equivalent). The lookup in the dictionary happens only once when the module is loaded and the addresses of functions are cached, so the calls during the run time have almost no overhead.

We can inspect it, for example via

#foo.pyx:
cdef void doit():
    print("doit")
#foo.pxd
cdef void doit()

>>> cythonize -3 -i foo.pyx
>>> python -c "import foo; print(foo.__pyx_capi__)" 
{'doit': <capsule object "void (void)" at 0x7f7b10bb16c0>}

Now, calling a cdef function from another module is just jumping to the corresponding address.

Consequences: We need to cimport the needed funcionality.

Numpy is a little bit more complicated as it uses a sophisticated combination of A and D in order to postpone the resolution of symbols until the run time, thus not needing shared-object/dlls at link time (but at run time!).

Some functionality in numpy-pxd file can be directly used because they are inlined (or even just defines), for example PyArray_NDIM, basically everything from ndarraytypes.h. This is the reason one can use cython's ndarrays without much ado.

Other functionality (basically everything from ndarrayobject.h) cannot be accessed without calling np.import_array() in an initialization step, for example PyArray_FromAny. Why?

The answer is in the header __multiarray_api.h which is included in ndarrayobject.h, but cannot be found in the git-repository as it is generated during the installation, where the definition of PyArray_FromAny can be looked up:

...
static void **PyArray_API=NULL; //usually...
...
#define PyArray_CheckFromAny \
        (*(PyObject * (*)(PyObject *, PyArray_Descr *, int, int, int, PyObject *)) \
         PyArray_API[108])
...

PyArray_CheckFromAny isn't a name of a function, but a define fo a function pointer saved in PyArray_API, which is not initialized (i.e. is NULL), when module is first loaded! Btw, there is also a (private) function called PyArray_CheckFromAny, which is what the function pointer actually points to - and because the public version is a define there is no name collision when linked...

The last piece of the puzzle - the function _import_array (more or less the working horse behind np.import_array) is an inline function (case A), so only include path is needed, to be able to use it.

_import_array uses a similar approach to Cython's __pyx_capi__ to get the function pointers: The field is called _ARRAY_API and can be inspected via:

>>> import numpy.core._multiarray_umath as macore
>>> macore._ARRAY_API
<capsule object NULL at 0x7f17d85f3810>

More info about how PyArray_API can be initialized can be found in this SO-answer of mine.

However, when using functionality from numpy/math.pxd, one has to staticly link numpy's math-library (see for example this SO-question).

105

answered Sep 19 '22 18:09

ead

Related questions
                            
                                Writing a .CSV file in Python that works for both Python 2.7+ and Python 3.3+ in Windows
                            
                                Importing cython function: AttributeError: 'module' object has no attribute 'fun'
                            
                                sys_platform is not defined x64 Windows
                            
                                Old pre-0.17 pandas.read_csv behavior of `header=True` for inferring header row?
                            
                                Extracting raw data from a PowerPivot model using Python
                            
                                Input to LSTM network tensorflow
                            
                                how can I avoid storing a command in ipython history?
                            
                                Can I run multiple threads in a single heroku (python) dyno?
                            
                                How can I make a python script change itself?
                            
                                Python Selenium Webdriver `Failed to start browser: Permission Denied`
                            
                                Portable Python com server using pywin32
                            
                                Overload all arithmetic operators in Python
                            
                                Is there a way to make flake8 check for type hints in the source
                            
                                When should I subclass EnumMeta instead of Enum?
                            
                                Difference between TestCase and TransactionTestCase classes in django test
                            
                                How to ensure that Spyder runs within a conda environment?
                            
                                Why can't I swap two items in a list in one line?
                            
                                Hierarchical data: efficiently build a list of every descendant for each node
                            
                                How to fix AttributeError: module 'numpy' has no attribute 'square' [closed]
                            
                                Custom validators in WTForms using Flask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With