I have some code writen in Python for which the output is a numpy array, and now I want to send that output to C++
code, where the heavy part of the calculations will be performed.
I have tried using cython's public cdef
, but I am running on some issues. I would appreciate your help! Here goes my code:
pymodule.pyx
:
from pythonmodule import result # result is my numpy array
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cdef public void cfunc():
print 'I am in here!!!'
cdef np.ndarray[np.float64_t, ndim=2, mode='c'] res = result
print res
Once this is cythonized, I call:
pymain.c
:
#include <Python.h>
#include <numpy/arrayobject.h>
#include "pymodule.h"
int main() {
Py_Initialize();
initpymodule();
test(2);
Py_Finalize();
}
int test(int a)
{
Py_Initialize();
initpymodule();
cfunc();
return 0;
}
I am getting a NameError
for the result
variable at C++
. I have tried defining it with pointers and calling it indirectly from other functions, but the array remains invisible. I am pretty sure the answer is quite simple, but I just do not get it. Thanks for your help!
That why we need the use ctypes library to specify the C data type that we are passing to our C function: The ctypes.data is an attribute of numpy that returns a pointer to the array, we use c_void_p to specify that we are passing a pointer to our function.
The C function takes a pointer to the numpy array, then we use malloc to allocate enough space for our resulting array. Then we iterate over the matrix using a double for loop. Notice that the first for is from 0 to n*m with increments of n , this is because we need to iterate over the 2d array as if it were a 1d array.
However, sometimes this is not appropriate or convenient, and you just want to write a function in C and then call it directly from Python. Numpy is one of the most popular packages for scientific computing in Python. It allows you to create very efficient matrices and vectors in Python with a C backend.
The ctypes.data is an attribute of numpy that returns a pointer to the array, we use c_void_p to specify that we are passing a pointer to our function. In the same way we use c_int to indicate that we are passing a data of type int.
The NameError was cause by the fact that Python couldn't find the module, the working directory isn't automatically added to your PYTHONPATH
. Using setenv
with setenv("PYTHONPATH", ".", 1);
in your C/C++
code fixes this.
There's an easy way to do this, apparently. With a python module pythonmodule.py
containing an already created array:
import numpy as np
result = np.arange(20, dtype=np.float).reshape((2, 10))
You can structure your pymodule.pyx
to export that array by using the public
keyword. By adding some auxiliary functions, you'll generally won't need to touch neither the Python, nor the Numpy C-API
:
from pythonmodule import result
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
cdef public np.ndarray getNPArray():
""" Return array from pythonmodule. """
return <np.ndarray>result
cdef public int getShape(np.ndarray arr, int shape):
""" Return Shape of the Array based on shape par value. """
return <int>arr.shape[1] if shape else <int>arr.shape[0]
cdef public void copyData(float *** dst, np.ndarray src):
""" Copy data from src numpy array to dst. """
cdef float **tmp
cdef int i, j, m = src.shape[0], n=src.shape[1];
# Allocate initial pointer
tmp = <float **>malloc(m * sizeof(float *))
if not tmp:
raise MemoryError()
# Allocate rows
for j in range(m):
tmp[j] = <float *>malloc(n * sizeof(float))
if not tmp[j]:
raise MemoryError()
# Copy numpy Array
for i in range(m):
for j in range(n):
tmp[i][j] = src[i, j]
# Assign pointer to dst
dst[0] = tmp
Function getNPArray
and getShape
return the array and its shape, respectively. copyData
was added in order to just extract the ndarray.data
and copy it so you can then finalize Python and work without having the interpreter initialized.
A sample program (in C
, C++
should look identical) would look like this:
#include <Python.h>
#include "numpy/arrayobject.h"
#include "pyxmod.h"
#include <stdio.h>
void printArray(float **arr, int m, int n);
void getArray(float ***arr, int * m, int * n);
int main(int argc, char **argv){
// Holds data and shapes.
float **data = NULL;
int m, n;
// Gets array and then prints it.
getArray(&data, &m, &n);
printArray(data, m, n);
return 0;
}
void getArray(float ***data, int * m, int * n){
// setenv is important, makes python find
// modules in working directory
setenv("PYTHONPATH", ".", 1);
// Initialize interpreter and module
Py_Initialize();
initpyxmod();
// Use Cython functions.
PyArrayObject *arr = getNPArray();
*m = getShape(arr, 0);
*n = getShape(arr, 1);
copyData(data, arr);
if (data == NULL){ //really redundant.
fprintf(stderr, "Data is NULL\n");
return ;
}
Py_DECREF(arr);
Py_Finalize();
}
void printArray(float **arr, int m, int n){
int i, j;
for(i=0; i < m; i++){
for(j=0; j < n; j++)
printf("%f ", arr[i][j]);
printf("\n");
}
}
Always remember to set:
setenv("PYTHONPATH", ".", 1);
before you call Py_Initialize
so Python can find modules in the working directory.
The rest is pretty straight-forward. It might need some additional error-checking and definitely needs a function to free the allocated memmory.
Doing it the way you are attempting is way hassle than it's worth, you would probably be better off using numpy.save
to save your array in a npy
binary file and then use some C++ library that reads that file for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With