Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a std::vector to a NumPy array without copying data

I have a C++ library which currently has some methods inside which return a std::vector defined like

public:
  const std::vector<uint32_t>& getValues() const;

I'm currently working on wrapping the whole library for Python using SWIG and this is working well so far.

SWIG wraps this getValues() function fine such that it returns a Python tuple. The issue is in my Python-side code I want to convert this to a NumPy array. Of course I can do this by:

my_array = np.array(my_object.getValues(), dtype='uint32')

but this causes all the entries in the original vector to be first copied into a Python tuple by SWIG and then again into a numpy array by me. Since this vector could be very large, I'd rather avoid making these two copies and would like for a way to have SWIG create a numpy.array wrapper around the original vector data in memory.

I've read the documentation for numpy.i but that explicitly mentions that output arrays are not supported since they seem to be working under the assumption of C-style arrays rather than C++ vectors.

numpy.array's underlying data structure is just a C-style array as is a C++ std::vector so I would hope that it is feasible to have then access the same data in memory.

Is there any way to make SWIG return a numpy.array which doesn't copy the original data?

like image 391
Milliams Avatar asked Apr 17 '13 16:04

Milliams


2 Answers

Apparently it is trivial to "cast" a C++ vector to (C) array, see answer on this question: How to convert vector to array in C++

Next you can create a numpy array which will use that C array without copying, see discussion here, or google for PyArray_SimpleNewFromData.

I wouldn't expect SWIG to do all these for you automatically, instead you should probably write a wrapper for your function getValues yourself, something like getValuesAsNumPyArray.

like image 103
piokuc Avatar answered Sep 26 '22 08:09

piokuc


It seems like PyArray_SimpleNewFromData would require you to do your own memory management; if memory management is already handled on the C++ side, that is, Python is not responsible for the memory, you can just use np.asarray to get a numpy array that shares memory with the C++ vector, like so:

from libcpp.vector cimport vector
import numpy as np
cdef vector[double] vec
vec.push_back(1)
vec.push_back(2)
cdef double *vec_ptr = &vec[0]    # get hold of data array underlying vec; also vec.data() if you have C++11
cdef double[::1] vec_view = <double[:vec.size()]>vec_ptr    # cast to typed memory view
vec_npr = np.asarray(vec_view)    # get numpy array from memory view
print(vec_npr)    # array([1.0, 2.0])

The "Wrapping C and C++ Arrays" section in chapter 10 of Kurt Smith's Cython book provides good examples on this. Also see Coercion to Numpy from official user guide.

like image 37
Yibo Yang Avatar answered Sep 23 '22 08:09

Yibo Yang