Thank you all in advance.
I am wondering what's the right way to #include
all numpy headers and what's the right way to use Cython and C++ to parse numpy arrays. Below is attempt:
// cpp_parser.h
#ifndef _FUNC_H_
#define _FUNC_H_
#include <Python.h>
#include <numpy/arrayobject.h>
void parse_ndarray(PyObject *);
#endif
I know this might be wrong, I also tried other options but none of them works.
// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>
using namespace std;
void parse_ndarray(PyObject *obj) {
if (PyArray_Check(obj)) { // this throws seg fault
cout << "PyArray_Check Passed" << endl;
} else {
cout << "PyArray_Check Failed" << endl;
}
}
The PyArray_Check
routine throws Segmentation Fault. PyArray_CheckExact
doesn't throw, but it is not what I wanted exactly.
# parser.pxd
cdef extern from "cpp_parser.h":
cdef void parse_ndarray(object)
and the implementation file is:
# parser.pyx
import numpy as np
cimport numpy as np
def py_parse_array(object x):
assert isinstance(x, np.ndarray)
parse_ndarray(x)
The setup.py
script is
# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np
ext = Extension(
name='parser',
sources=['parser.pyx', 'cpp_parser.cpp'],
language='c++',
include_dirs=[np.get_include()],
extra_compile_args=['-fPIC'],
)
setup(
name='parser',
ext_modules=cythonize([ext])
)
And finally the test script:
# run_test.py
import numpy as np
from parser import py_parse_array
x = np.arange(10)
py_parse_array(x)
I have created a git repo with all the scripts above: https://github.com/giantwhale/study_cython_numpy/
In a nutshell, segmentation fault refers to errors due to a process’s attempts to access memory regions that it shouldn’t. When the kernel detects odd memory access behaviors, it terminates the process issuing a segmentation violation signal (SIGSEGV).
Let’s see a very simple code snippet that will generate a segmentation violation: The ulimit command enables the generation of the process’s memory dump on errors. The compiling is done with gcc, the –ggdb option on the compile will insert debug info on the resulting binary.
Execute code in the current Python or Cython frame. This works like Python’s interactive interpreter. For Python frames it uses the globals and locals from the Python frame, for Cython frames it uses the dict of globals used on the Cython module and a new dict filled with the local Cython variables.
Every 1-2 years it starts showing "segmentation fault", when running some commands. I need to take the card out, reflash it with a backup image and put it back in.
Quick Fix (read on for more details and a more sophisticated approach):
You need to initialize the variable PyArray_API
in every cpp-file in which you are using numpy-stuff by calling import_array()
:
//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
import_array(); // PyError if not successful
return 0;
}
const static int numpy_initialized = init_numpy();
void parse_ndarraray(PyObject *obj) { // would be called every time
if (PyArray_Check(obj)) {
cout << "PyArray_Check Passed" << endl;
} else {
cout << "PyArray_Check Failed" << endl;
}
}
One could also use _import_array
, which returns a negative number if not successful, to use a custom error handling. See here for definition of import_array
.
Warning: As pointed out by @isra60, _import_array()/import_array()
can only be called, once Python is initialized, i.e. after Py_Initialize()
was called. This is always the case for an extension, but not always the case if the python interpreter is embedded, because numpy_initialized
is initialized before main
-starts. In this case, "the initialization trick" should not be used but init_numpy()
called after Py_Initialize()
.
Sophisticated solution:
NB: For information, why setting PyArray_API
is needed, see this SO-answer: in order to be able to postpone resolution of symbols until running time, so numpy's shared object aren't needed at link time and must not be on dynamic-library-path (python's system path is enough then).
The proposed solution is quick, but if there are more than one cpp using numpy, one have a lot of instances of PyArray_API initialized.
This can be avoided if PyArray_API
isn't defined as static but as extern
in all but one translation unit. For those translation units NO_IMPORT_ARRAY
macro must be defined before numpy/arrayobject.h
is included.
We need however a translation unit in which this symbol is defined. For this translation unit the macro NO_IMPORT_ARRAY
must not be defined.
However, without defining the macro PY_ARRAY_UNIQUE_SYMBOL
we will get only a static symbol, i.e. not visible for other translations unit, thus the linker will fail. The reason for that: if there are two libraries and everyone defines a PyArray_API
then we would have a multiple definition of a symbol and the linker will fail, i.e. we cannot use these both libraries together.
Thus, by defining PY_ARRAY_UNIQUE_SYMBOL
as MY_FANCY_LIB_PyArray_API
prior to every include of numpy/arrayobject.h
we would have our own PyArray_API
-name, which would not clash with other libraries.
Putting it all together:
A: use_numpy.h - your header for including numpy-functionality i.e. numpy/arrayobject.h
//use_numpy.h
//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API
//this macro must be defined for the translation unit
#ifndef INIT_NUMPY_ARRAY_CPP
#define NO_IMPORT_ARRAY //for usual translation units
#endif
//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>
B: init_numpy_api.cpp
- a translation unit for initializing of the global MY_PyArray_API
:
//init_numpy_api.cpp
//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP
//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"
//now the old trick with initialization:
int init_numpy(){
import_array();// PyError if not successful
return 0;
}
const static int numpy_initialized = init_numpy();
C: just include use_numpy.h
whenever you need numpy, it will define extern void **MyPyArray_API
:
//example
#include "use_numpy.h"
...
PyArray_Check(obj); // works, no segmentation error
Warning: It should not be forgotten, that for initialization-trick to work, Py_Initialize()
must be already called.
Why do you need it (kept for historical reasons):
When I build your extension with debug symbols:
extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],
and run it with gdb:
gdb --args python run_test.py
(gdb) run
--- Segmentation fault
(gdb) disass
I can see the following:
0x00007ffff1d2a6d9 <+20>: mov 0x203260(%rip),%rax
# 0x7ffff1f2d940 <_ZL11PyArray_API>
0x00007ffff1d2a6e0 <+27>: add $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>: mov (%rax),%rax
...
(gdb) print $rax
$1 = 16
We should keep in mind, that PyArray_Check
is only a define for:
#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)
That seems, that &PyArray_Type
uses somehow a part of PyArray_API
which is not initialized (has value 0
).
Let's take a look at the cpp_parser.cpp
after the preprocessor (compiled with flag -E
:
static void **PyArray_API= __null
...
static int
_import_array(void)
{
PyArray_API = (void **)PyCapsule_GetPointer(c_api,...
So PyArray_AP
I is static and is initialized via _import_array(void)
, that actually would explain the warning I get during the build, that _import_array()
was defined but not used - we didn't initialize PyArray_API
.
Because PyArray_API
is a static variable it must be initialized in every compilation unit i.e. cpp - file.
So we just need to do it - import_array()
seems to be the official way.
Since you use Cython, the numpy APIs have been included in the Cython Includes already. It's straight forward in jupyter notebook.
cimport numpy as np
from numpy cimport PyArray_Check
np.import_array() # Attention!
def parse_ndarray(object ndarr):
if PyArray_Check(ndarr):
print("PyArray_Check Passed")
else:
print("PyArray_Check Failed")
I believe np.import_array()
is a key here, since you call into the numpy APIs. Comment it and try, a crash also appears.
import numpy as np
from array import array
ndarr = np.arange(3)
pyarr = array('i', range(3))
parse_ndarray(ndarr)
parse_ndarray(pyarr)
parse_ndarray("Trick or treat!")
Output:
PyArray_Check Passed
PyArray_Check Failed
PyArray_Check Failed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With