I am searching on the internet in order to find some algorithm that can traverse a graph in parallel using 2 or n processes without one process stepping into a previously visited node of the other so I can speed up the total scanning task of the whole graph, but I can't find anything. Is there any algorithm that can help me do such task in parallel? is it worth it?
Note : n processes share the same memory of visited and tovisit nodes
thank you
The following graph can be represented as G ( {A, B, C, D, E}, { (A, B), (B, D), (D, E), (B, C), (C, A)}) The graph has two types of traversal algorithms. These are called the Breadth First Search and Depth First Search. The Breadth First Search (BFS) traversal is an algorithm, which is used to visit all of the nodes of a given graph.
The goal of a graph traversal, generally, is to find all nodes reachable from a given set of root nodes. In an undirected graph we follow all edges; in a directed graph we follow only out-edges.
A connected graph ‘G’ is traversable if and only if the number of vertices with odd degree in G is exactly 2 or 0. A connected graph G can contain an Euler’s path, but not an Euler’s circuit, if it has exactly two vertices with an odd degree. Note − This Euler path begins with a vertex of odd degree and ends with the other vertex of odd degree.
Thank you. The following example shows two ways in which parallel tasks can be used to traverse a tree data structure. The creation of the tree itself is left as an exercise.
You can try the consumer-producer model for traversing the graph - but with some modifications from the pure model:
visited
set in blocks. It will save you the synchronization time - which will be needed to be done less frequently.visited
set) - you should do some extra work to make sure you don't add data that was already visited since the set was last checked.Note that with this approach - you are more then likely to search some vertices a few times - but you can bound it with the frequency the queue and visited
set are updated.
Will it worth it? It is hard to say in these things - it is dependent on a lot of things (graph structure, size, queue implementation, ...).
You should run a few tests and try to fine tune the parameter for "how often to update", and check which is better empirically. You should use statistical tools (wilcoxon test is the de-facto standard for this usually) and determine if one is better then the other.
Unless the bulk of the time is spent on actual traversal, you could traverse the graph on a single thread, and queue up the work at each node to be processed in parallel from multiple processes. Once you have work in a queue, you can use a simple producer-consumer model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With