I have a C++ function that returns a std::vector
and, using Pybind11, I would like to return the contents of that vector as a Numpy array without having to copy the underlying data of the vector into a raw data array.
Current Attempt
In this well-written SO answer the author demonstrates how to ensure that a raw data array created in C++ is appropriately freed when the Numpy array has zero reference count. I tried to write a version of this using std::vector
instead:
// aside - I made a templated version of the wrapper with which
// I create specific instances of in the PYBIND11_MODULE definitions:
//
// m.def("my_func", &wrapper<int>, ...)
// m.def("my_func", &wrapper<float>, ...)
//
template <typename T>
py::array_t<T> wrapper(py::array_t<T> input) {
auto proxy = input.template unchecked<1>();
std::vector<T> result = compute_something_returns_vector(proxy);
// give memory cleanup responsibility to the Numpy array
py::capsule free_when_done(result.data(), [](void *f) {
auto foo = reinterpret_cast<T *>(f);
delete[] foo;
});
return py::array_t<T>({result.size()}, // shape
{sizeof(T)}, // stride
result.data(), // data pointer
free_when_done);
}
Observed Issues
However, if I call this from Python I observe two things: (1) the data in the output array is garbage and (2) when I manually delete the Numpy array I receive the following error (SIGABRT):
python3(91198,0x7fff9f2c73c0) malloc: *** error for object 0x7f8816561550: pointer being freed was not allocated
My guess is that this issue has to do with the line "delete[] foo
", which presumably is being called with foo
set to result.data()
. This is not the way to deallocate a std::vector
.
Possible Solutions
One possible solution is to create a T *ptr = new T[result.size()]
and copy the contents of result
to this raw data array. However, I have cases where the results might be large and I want to avoid taking all of that time to allocate and copy. (But perhaps it's not as long as I think it would be.)
Also, I don't know much about std::allocator
but perhaps there is a way to allocate the raw data array needed by the output vector outside the compute_something_returns_vector()
function call and then discard the std::vector
afterwards, retaining the underlying raw data array?
The final option is to rewrite compute_something_returns_vector
.
After an offline discussion with a colleague I resolved my problem. I do not want to commit an SO faux pas so I won't accept my own answer. However, for the sake of using SO as a catalog of information I want to provide the answer here for others.
The problem was simple: result
was stack-allocated and needed to be heap-allocated so that free_when_done
can take ownership. Below is an example fix:
{
// ... snip ...
std::vector<T> *result = new std::vector<T>(compute_something_returns_vector(proxy));
py::capsule free_when_done(result, [](void *f) {
auto foo = reinterpret_cast<std::vector<T> *>(f);
delete foo;
});
return py::array_t<T>({result->size()}, // shape
{sizeof(T)}, // stride
result->data(), // data pointer
free_when_done);
}
I was also able to implement a solution using std::unique_ptr
that doesn't require the use of a free_when_done
function. However, I wasn't able to run Valgrind with either solution so I'm not 100% sure that the memory held by the vector was appropriately freed. (Valgrind + Python is a mystery to me.) For completeness, below is the std::unique_ptr
approach:
{
// ... snip ...
std::unique_ptr<std::vector<T>> result =
std::make_unique<std::vector<T>>(compute_something_returns_vector(proxy));
return py::array_t<T>({result->size()}, // shape
{sizeof(T)}, // stride
result->data()); // data pointer
}
I was, however, able to inspect the addresses of the vectors allocated in both the Python and C++ code and confirmed that no copies of the output of compute_something_returns_vector()
were made.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With