Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reducing memory requirements for adjacency list

I'm using adjacency_list< vecS, vecS, bidirectionalS ... > extensively. I have so many graphs loaded at once that memory becomes an issue. I'm doing static program analysis and store the callgraph and flowgraphs of the disassembled binary in boost graphs. Thus I can have several ten thousand functions==flowgraphs and one gigantic callgraph. I'd really like to reduce memory usage for my graphs while still using the BGL.

Since my graphs are static after loading and edge and vertex counts are known beforehand I see huge potential for optimization. For example, I'd like to allocate a single buffer for all vertices/edges of a single graph and let the graph just store indices into that buffer.

more questions:
1) what's the memory overhead of using vertex and edge properties? I have quite a few of them.
2) is it possible to convince the BGL to use the shrink to fit idiom? As I understand it the adjacency lists use push_back to add edges. Is it possible to reduce memory usage by swapping the resulting vector with a copy of itself? Maybe by copying the whole graph?
3) Is it possible to use boost pool allocators with the BGL? As far as I can tell the BGL currently performs lots of small allocations - I'd really like to avoid that for space and runtime efficiency reasons.

Did anyone already build a BGL version optimized for memory usage? Should I try using the existing graph structures and augment it with custom allocators or somesuch or is it more fruitful to write my own implementation and try to stay interface compatible with the BGL so I may continue to use it's algorithms?

best regards,

Sören
like image 335
BuschnicK Avatar asked Sep 01 '09 14:09

BuschnicK


2 Answers

There is a little known graph type called "compressed sparse row" graph in the BGL. It seems to be quite new and is not linked from the index pages. It does however employ a beautiful little trick to get the graph representation as small as possible. http://www.boost.org/doc/libs/1_40_0/libs/graph/doc/compressed_sparse_row.html

Using this for some of our graphs I have already been able to reduce total memory usage by 20% - so it is a very worthwhile optimization indeed.

It also stores the vertex/edge properties in vectors thus giving the smallest possible overhead for those as well.

Note that the version shipping with the latest boost 1.40 supports directional graphs only (as opposed to bi-directional). If you need to be able to efficiently iterate over a vertice's out-edges and in-edges (as I did) you need to check out the boost trunk from subversion. Jeremiah was very helpful in adding that feature on my request.

like image 135
BuschnicK Avatar answered Sep 26 '22 15:09

BuschnicK


  1. The overhead depends which version you're using and whether you've gone for "bundled" properties or not. I've only used bundled properties and reading the code I'd expect each property bundle costs you 2 pointers + sizeof the bundle type you're using + sizeof each of the attached properties. None of the concept checking stuff is left in the binary afaik. Since you have the code though, why not just measure the cost? If you don't have any tooling to help try just generating graphs of known sizes in buffers of known sizes. Something will fail eventually and at that point you should have counts.

  2. Have you tried calling adjacency_list<blah>.swap(adjacency_list& x)? I'd hope that would shrink the containers appropriately.

  3. ???

I started writing something like the system you're describing, but eventually gave up on the BGL and switched to coding my own algorithms to run on an sqlite database of all the linker symbols.

like image 43
Mike Burrows Avatar answered Sep 23 '22 15:09

Mike Burrows