Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing around bundled properties in the Boost Graph Library

I am porting some graph code out of Python (networkx) and into C++ (BGL). In my Python code, the vertices and edges of the graph are client-defined objects implementing an established interface; I go on to call a bunch of methods on them. Everything is fine.

Naively, it would seem that BGL is meant to support a similar design pattern with "bundled properties." These basically allow one to define custom types for the vertices and edges by passing certain template parameters:

adjacency_list<OutEdgeList, VertexList,
               Directed, VertexProperties,
               EdgeProperties, GraphProperties,
               EdgeList>

The custom vertex and edge types here are given by VertexProperties and EdgeProperties.

In working on this port I've noticed a couple of things that make me think that maybe BGL's bundled properties interface is really only meant to support (more-or-less) immutable types:

  • Edge and vertex "descriptors"

    If you put something into the graph, you get back a "descriptor" that you have to use to reference it from there on. There are descriptors for edges and vertices, and they are the "keys" -- implemented as immutables -- to the graph. So if you have a vertex and you want to find neighboring vertices, you have to (a) get the current vertex descriptor, (b) use BGL methods to find the descriptors of its neighbors, and finally (c) reference each of the neighbors via their respective descriptors.

    The net result of all this bookkeeping is an apparent need for additional containers -- std::map, say -- to provide reverse-lookup from vertices and edges (again, types VertexProperty and EdgeProperty) to their descriptors.

  • "BGL isn't meant to store pointers."

    I spotted this claim in a similar SO question but haven't been able to verify it in the Boost docs anywhere. From the ensuing discussion I am left to speculate that the constraint may actually be a bit stronger: "BGL isn't meant to directly reference the heap." This doesn't seem entirely reasonable, though, as the container types are configurable (OutEdgeList and VertexList, above) and default standard things like vectors.

I'm a BGL n00b and am having trouble understanding what's intended with bundled properties. (And, frankly, I feel a bit overwhelmed by the programming model in general -- "properties", "concepts", "traits", "descriptors", AHHHH!) Questions:

  1. Do BGL graphs efficiently support complex and possibly heap-bound types for VertexProperty and EdgeProperty? Or are these meant to be lightweight containers for immutable data?

  2. If the former, is there any way to get around all the descriptor bookkeeping?

  3. If the latter, what's the "right way" to deal with large, complicated things that we might want to stick in a BGL graph?

like image 908
BrianTheLion Avatar asked Apr 29 '15 22:04

BrianTheLion


1 Answers

  1. Do BGL graphs efficiently support complex and possibly heap-bound types for VertexProperty and EdgeProperty? Or are these meant to be lightweight containers for immutable data?

Of course. You can do it anyway you want. Point in case: you could make the bundle hold an (immutable or mutable) pointer to a heap allocated "complex" type that is - of course - entirely mutable. Now,

I'd suggest using your preferred ownership adaptor (unique_ptr, scoped_ptr, shared_ptr, what not). Look at std::string: it too is "heap based", but you didn't ever worry about using it in a property bundle, did you?

  1. If the former, is there any way to get around all the descriptor bookkeeping?

There is not strictly any "descriptor bookkeeping". There might be depending on the graph model. But in general, descriptor is really an abstraction of the underlying container iterator model (not that this could be iterators across multiple containers, e.g. the edges(EdgeListgraph) as opposed to out_edges(v, IncidenceGraph) for adjacency_list<>.

The point is to decouple the logic from assumptions about the storage model. Even with some pretty unsightly void* passing-around, the compiler should compile it down to the same code as direct iterator access. In this sense the "bookkeeping" is usually perceptual, and you are likely picking up on mental load of having the extra conceptual layer.

  1. If the latter, what's the "right way" to deal with large, complicated things that we might want to stick in a BGL graph?

Oops. I think I accidentally addressed this under 1. The absolute simplest thing to use might be refcounted sharing. Your specific situation might allow for more efficient solutions.


CAPITA SELECTA

  • "BGL isn't meant to store pointers."

Well, not as bundles, perhaps. Even here, it depends solely on how you manage the ownership/lifetime of the pointees.

I think the linked answer is excellent. It resonates with what I said above.

  • So if you have a vertex and you want to find neighboring vertices, you have to (a) get the current vertex descriptor, (b) use BGL methods to find the descriptors of its neighbors, and finally (c) reference each of the neighbors via their respective descriptors.

Most BGL algorithms rely on the presence of vertex_index_t property (or require one to be specified as a parameter) to make sure these are low-cost operations. In fact, if you use vecS then the vertex index is the index into the vertex vector, so reverse and forward lookups are pretty simple. (You can always look at the generated assembly with optimizations enabled to see whether there are any surprises).

This answer might be inspirational: Boost Graph Library: possible to combine Bundled Properties with Interior Properties?


TL;DR / Summary

I have a feeling coming from Python you might be understandably underestimating C++ optimizing compilers in template-heavy generic library code.

What bubbles up when you sift through the layers of generic machinery in practice evaporates at compile time. Of course, the downside of this is that can be hard to see through the abstractions. But it also means that power and abstraction level aren't compromised.

BGL is one library that allows you to switch to radically different graph models, storage layout etc. with very few code changes. The goal here wasn't "ease of use" or "do what I mean" (use Java or python for that).

The goal was to make the choice for C++ /not/ remove all agility by hard-coding every single implementation detail into your entire code base. Instead, work at the level of library concepts and benefit from the freedom you retain to experiment/change your approach as requirements evolve.

like image 67
sehe Avatar answered Nov 05 '22 06:11

sehe