Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove 100,000+ nodes from a Boost graph

Tags:

c++

boost

I have a graph ( adjacency_list (listS, vecS, bidirectionalS, VertexVal) ) in which I need to delete 100,000+ nodes. Each node also contains a structure of 2 64-bit integers and another 64-bit integer. The guid check that happens in the code below is checking 1st integer in the structure.

On my laptop ( i7 2.7GHz, 16GB RAM ) it takes about 88 seconds according to VTune.

Following is how I delete the nodes:

  vertex_iterator vi,vi_end;
  boost::tie(vi, vi_end) = boost::vertices(m_graph);
  while (vi!=vi_end) {
    if (m_graph[*vi].guid.part1 == 0) {
      boost::remove_vertex(*vi,m_graph);
      boost::tie(vi, vi_end) = boost::vertices(m_graph);
    } else 
      ++vi;
  }

Vtune shows that the boost::remove_vertex() call takes 88.145 seconds. Is there a more efficient way to delete these vertices? Vtune data for boost::remove_vertex_dispatch(). This is the breakdown of the 88 seconds

like image 825
Dula Avatar asked Feb 06 '15 23:02

Dula


2 Answers

In your removal branch you re-tie() the iterators:

boost::tie(vi, vi_end) = boost::vertices(m_graph);

This will cause the loop to restart every time you restart the loop. This is exactly Schlemiel The Painter.

I'll find out whether you can trust remove_vertex not triggering a reallocation. If so, it's easily fixed. Otherwise, you'd want an indexer-based loop instead of iterator-based. Or you might be able to work on the raw container (it's a private member, though, as I remember).

Update Using vecS as the container for vertices is going to cause bad performance here:

If the VertexList template parameter of the adjacency_list was vecS, then all vertex descriptors, edge descriptors, and iterators for the graph are invalidated by this operation. <...> If you need to make frequent use of the remove_vertex() function the listS selector is a much better choice for the VertexList template parameter.


This small benchmark test.cpp compares:

  • with -DSTABLE_IT (listS)

    $ ./stable 
    Generated 100000 vertices and 5000 edges in 14954ms
    The graph has a cycle? false
    starting selective removal...
    Done in 0ms
    After: 99032 vertices and 4916 edges
    
  • without -DSTABLE_IT (vecS)

    $ ./unstable 
    Generated 100000 vertices and 5000 edges in 76ms
    The graph has a cycle? false
    starting selective removal...
    Done in 396ms
    After: 99032 vertices and 4916 edges
    
  • using filtered_graph (thanks @cv_and_he in the comments)

    Generated 100000 vertices and 5000 edges in 15ms
    The graph has a cycle? false
    starting selective removal...
    Done in 0ms
    After: 99032 vertices and 4916 edges
    Done in 13ms
    

You can clearly see that removal is much faster for listS but generating is much slower.

like image 77
sehe Avatar answered Oct 11 '22 23:10

sehe


I was able to successfully serialize the graph using Boost serialization routines into a string, parse the string and remove the nodes I didn't need and de-serialize the modified string. For 200,000 total nodes in graph and 100,000 that needs to be deleted I was able to successfully finish the operation in less than 2 seconds.

For my particular use-case each vertex has 3 64bit integers. When it needs to be deleted, I mark 2 of those integers as 0s. A valid vertex would never have a 0. When the point comes to clean up the graph - to delete the "deleted" vertices, I follow the above logic.

In the code below removeDeletedNodes() does the string parsing and removing the vertices and mapping the edge numbers.

enter image description here

like image 28
Dula Avatar answered Oct 11 '22 21:10

Dula