Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to traverse a graph in parallel with 2 or n processes

I am searching on the internet in order to find some algorithm that can traverse a graph in parallel using 2 or n processes without one process stepping into a previously visited node of the other so I can speed up the total scanning task of the whole graph, but I can't find anything. Is there any algorithm that can help me do such task in parallel? is it worth it?

Note : n processes share the same memory of visited and tovisit nodes

thank you

like image 865
themhz Avatar asked Nov 30 '12 17:11

themhz


People also ask

Which graph has two types of traversal algorithms?

The following graph can be represented as G ( {A, B, C, D, E}, { (A, B), (B, D), (D, E), (B, C), (C, A)}) The graph has two types of traversal algorithms. These are called the Breadth First Search and Depth First Search. The Breadth First Search (BFS) traversal is an algorithm, which is used to visit all of the nodes of a given graph.

What is the goal of a graph traversal?

The goal of a graph traversal, generally, is to find all nodes reachable from a given set of root nodes. In an undirected graph we follow all edges; in a directed graph we follow only out-edges.

How do you know if a graph is traversable?

A connected graph ‘G’ is traversable if and only if the number of vertices with odd degree in G is exactly 2 or 0. A connected graph G can contain an Euler’s path, but not an Euler’s circuit, if it has exactly two vertices with an odd degree. Note − This Euler path begins with a vertex of odd degree and ends with the other vertex of odd degree.

Can parallel tasks be used to traverse a tree data structure?

Thank you. The following example shows two ways in which parallel tasks can be used to traverse a tree data structure. The creation of the tree itself is left as an exercise.


2 Answers

You can try the consumer-producer model for traversing the graph - but with some modifications from the pure model:

  • Read and write to the queue in blocks, rather then element at a time, also update the visited set in blocks. It will save you the synchronization time - which will be needed to be done less frequently.
  • When you do modify the queue (and visited set) - you should do some extra work to make sure you don't add data that was already visited since the set was last checked.

Note that with this approach - you are more then likely to search some vertices a few times - but you can bound it with the frequency the queue and visited set are updated.

Will it worth it? It is hard to say in these things - it is dependent on a lot of things (graph structure, size, queue implementation, ...).

You should run a few tests and try to fine tune the parameter for "how often to update", and check which is better empirically. You should use statistical tools (wilcoxon test is the de-facto standard for this usually) and determine if one is better then the other.

like image 142
amit Avatar answered Nov 10 '22 15:11

amit


Unless the bulk of the time is spent on actual traversal, you could traverse the graph on a single thread, and queue up the work at each node to be processed in parallel from multiple processes. Once you have work in a queue, you can use a simple producer-consumer model.

like image 33
Scott Wegner Avatar answered Nov 10 '22 14:11

Scott Wegner