Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for allowing concurrent walk of a graph

In a directed acyclic graph describing a set of tasks to process, i need to find all tasks that can be processed concurrently. The graph has no loops and is quite small (~1000 nodes, ~2000 edges), performance is not a primary concern.

Examples with desired result:

  • [] is a group. All tasks in a group must be processed before continuing
  • [x & y] means x and y can be processed concurrently (x and y in parallel)
  • x -> y means x and y must be processed sequentially (x before y)

1

graph 1

a -> [b & c] -> c

2

graph 2

[a & e] -> b -> c -> [d & f]

3

graph 3

[ [a -> b] & [e -> f] ] -> [ [c -> d] & g ]

I do not want to actually execute the graph, but rather build a data structure that is as parallel as possible, while maintaining the order. The nomenclature and names of algorithms is not that familiar to me, so i'm having a hard time trying to find similar problems/solutions online.

like image 473
Antti Avatar asked Oct 10 '19 07:10

Antti


2 Answers

Mathematically, I would frame this problem as finding a minimally defined series-parallel partial order extending the given partial order.

I would start by transitively reducing the graph and repeatedly applying two heuristics.

  1. If x has one dependent y, and y has one dependency x, merge them into a new node z = [x → y].
  2. If x and y have the same dependencies and dependents, merge them into a new node z = [x & y].

Now, if the input is already series-parallel, the result will be one node. In general, however, this will leave a graph that embeds an N-shaped structure like b → c, b → g, f → g from the last example in the question. This structure must be addressed by adding one or more of b → f, c → f, c → g, f → b, f → c, g → c. But in a different instance, this act would in turn create new N-shaped structures. There's no obvious notion of a closure, which is why this problem feels hard to me.

Some of these choices seem worse than others. For example, c → f forces the sequence b → c → f → g, whereas f → c is the only choice that doesn't increase the length of the critical path.

I guess what I'd try is,

  1. If heuristics 1 and 2 have no targets, form a graph with edges x--y if and only if x and y have either a common dependent or a common dependency, compute the connected components of this graph, and &-merge the smallest component that isn't a singleton, followed by another transitive reduction.
like image 104
David Eisenstat Avatar answered Nov 09 '22 02:11

David Eisenstat


Here's a solution i came up with (pseudocode):

sequence = []
for each (node, depth) in depthFirstSearch(graph)
  sequence[depth].push(node)
return sequence

The sequence defines the order to process the graph. If an item in it contains more than one node, they can be processed concurrently.

While this allows for some concurrency, it does not advance as fast as it could. For example, f in the 3rd example in the question would require a to be completed first (as it will be at depth 1, when a and e are depth 0). Ideally work on f could start when e is done.

like image 35
Antti Avatar answered Nov 09 '22 03:11

Antti