Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrapping an std::vector using boost::python vector_indexing_suite

I am working on a C++ library with Python bindings (using boost::python) representing data stored in a file. Majority of my semi-technical users will be using Python to interact with it, so I need to make it as Pythonic as possible. However, I will also have C++ programmers using the API, so I do not want to compromise on the C++ side to accommodate Python bindings.

A large part of the library will be made out of containers. To make things intuitive for the python users, I would like them to behave like python lists, i.e.:

# an example compound class
class Foo:
    def __init__( self, _val ):
        self.val = _val

# add it to a list
foo = Foo(0.0)
vect = []
vect.append(foo)

# change the value of the *original* instance
foo.val = 666.0
# which also changes the instance inside the container
print vect[0].val # outputs 666.0

The test setup

#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>
#include <boost/python/register_ptr_to_python.hpp>
#include <boost/shared_ptr.hpp>

struct Foo {
    double val;

    Foo(double a) : val(a) {}
    bool operator == (const Foo& f) const { return val == f.val; }
};

/* insert the test module wrapping code here */

int main() {
    Py_Initialize();
    inittest();

    boost::python::object globals = boost::python::import("__main__").attr("__dict__");

    boost::python::exec(
        "import test\n"

        "foo = test.Foo(0.0)\n"         // make a new Foo instance
        "vect = test.FooVector()\n"     // make a new vector of Foos
        "vect.append(foo)\n"            // add the instance to the vector

        "foo.val = 666.0\n"             // assign a new value to the instance
                                        //   which should change the value in vector

        "print 'Foo =', foo.val\n"      // and print the results
        "print 'vector[0] =', vect[0].val\n",

        globals, globals
    );

    return 0;
}

The way of the shared_ptr

Using the shared_ptr, I can get the same behaviour as above, but it also means that I have to represent all data in C++ using shared pointers, which is not nice from many points of view.

BOOST_PYTHON_MODULE( test ) {
    // wrap Foo
    boost::python::class_< Foo, boost::shared_ptr<Foo> >("Foo", boost::python::init<double>())
        .def_readwrite("val", &Foo::val);

    // wrap vector of shared_ptr Foos
    boost::python::class_< std::vector < boost::shared_ptr<Foo> > >("FooVector")
        .def(boost::python::vector_indexing_suite<std::vector< boost::shared_ptr<Foo> >, true >());
}

In my test setup, this produces the same output as pure Python:

Foo = 666.0
vector[0] = 666.0

The way of the vector<Foo>

Using a vector directly gives a nice clean setup on the C++ side. However, the result does not behave in the same way as pure Python.

BOOST_PYTHON_MODULE( test ) {
    // wrap Foo
    boost::python::class_< Foo >("Foo", boost::python::init<double>())
        .def_readwrite("val", &Foo::val);

    // wrap vector of Foos
    boost::python::class_< std::vector < Foo > >("FooVector")
        .def(boost::python::vector_indexing_suite<std::vector< Foo > >());
}

This produces:

Foo = 666.0
vector[0] = 0.0

Which is "wrong" - changing the original instance did not change the value inside the container.

I hope I don't want too much

Interestingly enough, this code works no matter which of the two encapsulations I use:

footwo = vect[0]
footwo.val = 555.0
print vect[0].val

Which means that boost::python is able to deal with "fake shared ownership" (via its by_proxy return mechanism). Is there any way to achieve the same while inserting new elements?

However, if the answer is no, I'd love to hear other suggestions - is there an example in the Python toolkit where a similar collection encapsulation is implemented, but which does not behave as a python list?

Thanks a lot for reading this far :)

like image 685
Martin Prazak Avatar asked Nov 22 '14 12:11

Martin Prazak


2 Answers

Due to the semantic differences between the languages, it is often very difficult to apply a single reusable solution to all scenarios when collections are involved. The largest issue is that the while Python collections directly support references, C++ collections require a level of indirection, such as by having shared_ptr element types. Without this indirection, C++ collections will not be able to support the same functionality as Python collections. For instance, consider two indexes that refer to the same object:

s = Spam()
spams = []
spams.append(s)
spams.append(s)

Without pointer-like element types, a C++ collection could not have two indexes referring to the same object. Nevertheless, depending on usage and needs, there may be options that allow for a Pythonic-ish interface for the Python users while still maintaining a single implementation for C++.

  • The most Pythonic solution would be to use a custom converter that would convert a Python iterable object to a C++ collection. See this answer for implementation details. Consider this option if:
    • The collection's elements are cheap to copy.
    • The C++ functions operate only on rvalue types (i.e., std::vector<> or const std::vector<>&). This limitation prevents C++ from making changes to the Python collection or its elements.
  • Enhance vector_indexing_suite capabilities, reusing as many capabilities as possible, such as its proxies for safely handling index deletion and reallocation of the underlying collection:
    • Expose the model with a custom HeldType that functions as a smart pointer and delegate to either the instance or the element proxy objects returned from vector_indexing_suite.
    • Monkey patch the collection's methods that insert elements into the collection so that the custom HeldType will be set to delegate to a element proxy.

When exposing a class to Boost.Python, the HeldType is the type of object that gets embedded within a Boost.Python object. When accessing the wrapped types object, Boost.Python invokes get_pointer() for the HeldType. The object_holder class below provides the ability to return a handle to either an instance it owns or to an element proxy:

/// @brief smart pointer type that will delegate to a python
///        object if one is set.
template <typename T>
class object_holder
{
public:

  typedef T element_type;

  object_holder(element_type* ptr)
    : ptr_(ptr),
      object_()
  {}

  element_type* get() const
  {
    if (!object_.is_none())
    {
      return boost::python::extract<element_type*>(object_)();
    }
    return ptr_ ? ptr_.get() : NULL;
  }

  void reset(boost::python::object object)
  {
    // Verify the object holds the expected element.
    boost::python::extract<element_type*> extractor(object_);
    if (!extractor.check()) return;

    object_ = object;
    ptr_.reset();
  }

private:
  boost::shared_ptr<element_type> ptr_;
  boost::python::object object_;
};

/// @brief Helper function used to extract the pointed to object from
///        an object_holder.  Boost.Python will use this through ADL.
template <typename T>
T* get_pointer(const object_holder<T>& holder)
{
  return holder.get();
}

With the indirection supported, the only thing remaining is patching the collection to set the object_holder. One clean and reusable way to support this is to use def_visitor. This is a generic interface that allows for class_ objects to be extended non-intrusively. For instance, the vector_indexing_suite uses this capability.

The custom_vector_indexing_suite class below monkey patches the append() method to delegate to the original method, and then invokes object_holder.reset() with a proxy to the newly set element. This results in the object_holder referring to the element contained within the collection.

/// @brief Indexing suite that will resets the element's HeldType to
///        that of the proxy during element insertion.
template <typename Container,
          typename HeldType>
class custom_vector_indexing_suite
  : public boost::python::def_visitor<
      custom_vector_indexing_suite<Container, HeldType>>
{
private:

  friend class boost::python::def_visitor_access;

  template <typename ClassT>
  void visit(ClassT& cls) const
  {
    // Define vector indexing support.
    cls.def(boost::python::vector_indexing_suite<Container>());

    // Monkey patch element setters with custom functions that
    // delegate to the original implementation then obtain a 
    // handle to the proxy.
    cls
      .def("append", make_append_wrapper(cls.attr("append")))
      // repeat for __setitem__ (slice and non-slice) and extend
      ;
  }

  /// @brief Returned a patched 'append' function.
  static boost::python::object make_append_wrapper(
    boost::python::object original_fn)
  {
    namespace python = boost::python;
    return python::make_function([original_fn](
          python::object self,
          HeldType& value)
        {
          // Copy into the collection.
          original_fn(self, value.get());
          // Reset handle to delegate to a proxy for the newly copied element.
          value.reset(self[-1]);
        },
      // Call policies.
      python::default_call_policies(),
      // Describe the signature.
      boost::mpl::vector<
        void,           // return
        python::object, // self (collection)
        HeldType>()     // value
      );
  }
};

Wrapping needs to occur at runtime and custom functor objects cannot be directly defined on the class via def(), so the make_function() function must be used. For functors, it requires both CallPolicies and a MPL front-extensible sequence representing the signature.


Here is a complete example that demonstrates using the object_holder to delegate to proxies and custom_vector_indexing_suite to patch the collection.

#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>

/// @brief Mockup type.
struct spam
{
  int val;

  spam(int val) : val(val) {}
  bool operator==(const spam& rhs) { return val == rhs.val; }
};

/// @brief Mockup function that operations on a collection of spam instances.
void modify_spams(std::vector<spam>& spams)
{
  for (auto& spam : spams)
    spam.val *= 2;
}

/// @brief smart pointer type that will delegate to a python
///        object if one is set.
template <typename T>
class object_holder
{
public:

  typedef T element_type;

  object_holder(element_type* ptr)
    : ptr_(ptr),
      object_()
  {}

  element_type* get() const
  {
    if (!object_.is_none())
    {
      return boost::python::extract<element_type*>(object_)();
    }
    return ptr_ ? ptr_.get() : NULL;
  }

  void reset(boost::python::object object)
  {
    // Verify the object holds the expected element.
    boost::python::extract<element_type*> extractor(object_);
    if (!extractor.check()) return;

    object_ = object;
    ptr_.reset();
  }

private:
  boost::shared_ptr<element_type> ptr_;
  boost::python::object object_;
};

/// @brief Helper function used to extract the pointed to object from
///        an object_holder.  Boost.Python will use this through ADL.
template <typename T>
T* get_pointer(const object_holder<T>& holder)
{
  return holder.get();
}

/// @brief Indexing suite that will resets the element's HeldType to
///        that of the proxy during element insertion.
template <typename Container,
          typename HeldType>
class custom_vector_indexing_suite
  : public boost::python::def_visitor<
      custom_vector_indexing_suite<Container, HeldType>>
{
private:

  friend class boost::python::def_visitor_access;

  template <typename ClassT>
  void visit(ClassT& cls) const
  {
    // Define vector indexing support.
    cls.def(boost::python::vector_indexing_suite<Container>());

    // Monkey patch element setters with custom functions that
    // delegate to the original implementation then obtain a 
    // handle to the proxy.
    cls
      .def("append", make_append_wrapper(cls.attr("append")))
      // repeat for __setitem__ (slice and non-slice) and extend
      ;
  }

  /// @brief Returned a patched 'append' function.
  static boost::python::object make_append_wrapper(
    boost::python::object original_fn)
  {
    namespace python = boost::python;
    return python::make_function([original_fn](
          python::object self,
          HeldType& value)
        {
          // Copy into the collection.
          original_fn(self, value.get());
          // Reset handle to delegate to a proxy for the newly copied element.
          value.reset(self[-1]);
        },
      // Call policies.
      python::default_call_policies(),
      // Describe the signature.
      boost::mpl::vector<
        void,           // return
        python::object, // self (collection)
        HeldType>()     // value
      );
  }

  // .. make_setitem_wrapper
  // .. make_extend_wrapper
};

BOOST_PYTHON_MODULE(example)
{
  namespace python = boost::python;

  // Expose spam.  Use a custom holder to allow for transparent delegation
  // to different instances.
  python::class_<spam, object_holder<spam>>("Spam", python::init<int>())
    .def_readwrite("val", &spam::val)
    ;

  // Expose a vector of spam.
  python::class_<std::vector<spam>>("SpamVector")
    .def(custom_vector_indexing_suite<
      std::vector<spam>, object_holder<spam>>())
    ;

  python::def("modify_spams", &modify_spams);
}

Interactive usage:

>>> import example
>>> spam = example.Spam(5)
>>> spams = example.SpamVector()
>>> spams.append(spam)
>>> assert(spams[0].val == 5)
>>> spam.val = 21
>>> assert(spams[0].val == 21)
>>> example.modify_spams(spams)
>>> assert(spam.val == 42)
>>> spams.append(spam)
>>> spam.val = 100
>>> assert(spams[1].val == 100)
>>> assert(spams[0].val == 42) # The container does not provide indirection.

As the vector_indexing_suite is still being used, the underlying C++ container should only be modified using the Python object's API. For instance, invoking push_back on the container may cause a reallocation of the underlying memory and cause problems with existing Boost.Python proxies. On the other hand, one can safely modify the elements themselves, such as was done via the modify_spams() function above.

like image 72
Tanner Sansbury Avatar answered Sep 21 '22 15:09

Tanner Sansbury


Unfortunately, the answer is no, you can't do what you want. In python, everything is a pointer, and lists are a container of pointers. The C++ vector of shared pointers work because the underlying data structure is more or less equivalent to a python list. What you are requesting is to have the C++ vector of allocated memory act like a vector of pointers, which can't be done.

Let's see what's happening in python lists, with C++ equivalent pseudocode:

foo = Foo(0.0)     # Foo* foo = new Foo(0.0)
vect = []          # std::vector<Foo*> vect
vect.append(foo)   # vect.push_back(foo)

At this point, foo and vect[0] both point to the same allocated memory, so changing *foo changes *vect[0].

Now with the vector<Foo> version:

foo = Foo(0.0)      # Foo* foo = new Foo(0.0)
vect = FooVector()  # std::vector<Foo> vect
vect.append(foo)    # vect.push_back(*foo)

Here, vect[0] has it's own allocated memory, and is a copy of *foo. Fundamentally, you can't make vect[0] be the same memory as *foo.

On a side note, be careful with lifetime management of footwo when using std::vector<Foo>:

footwo = vect[0]    # Foo* footwo = &vect[0]

A subsequent append may require moving the allocated storage for the vector, and may invalidate footwo (&vect[0] may change).

like image 27
Jay West Avatar answered Sep 19 '22 15:09

Jay West