Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DiGraph: Nearest node that joins all paths

Some background

I'm analyzing the control flow graph of a function which basically maps incoming data onto outgoing data. Most of the blocks are like this one:

if (input_variable == SPECIFIC_CONSTANT) {
    output_variable = TRUE;
}
else {
    output_variable = FALSE;
}

Typical Control Flow Graph for such code looks like the following graph

digraph G {
  2 -> 3 -> 5;
  2 -> 4 -> 5;
}

Picture of the graphe

where execution of 3 and 4 are conditioned by the value of the input_variable but 5 is independent.

The question

Given a directed graph and a start node, how do I find the nearest node such that any path from the start node goes through this node?

Example: Given this graph how do I find 6 starting from 2 or 12 starting from 8?

Can I reverse a Lowest Common Ancestor algorithm and would it be efficient? Like

for each node in graph:
    ancestors = node.get_all_ancestors()
    lca = find_lowest_common_ancestor(ancestors)
    junction_node[lca] = node

get_junction_point(node):
    return junction_node[node]

My programming language is Python and I just discovered NetworkX, but any algorithm would be appreciated. I am not accustomed to graph theory and I think I miss the basic glossary to find what I'm looking for.

Thanks for your help!

like image 671
Cilyan Avatar asked Nov 12 '13 15:11

Cilyan


1 Answers

Not the most efficient solution, but here's something that should get you started:

Do a DFS, then compute the intersection of all paths (nodes that exist in every path). Then, among those nodes, find the one that appears closest to the beginning in every path:

>>> paths
[]
>>> def dfs(G, s, path):
...     if s not in G or not G[s]:
...             paths.append(path)
...     else:
...             for t in G[s]:
...                     dfs({k:[_t for _t in v if _t!=t] for k,v in G.items()}, t, path+[t])
... 
>>> dfs(G, 2, [])
>>> paths
[[3, 4, 6, 7, 12], [3, 4, 6, 8, 9, 10, 12], [3, 4, 6, 8, 9, 12], [3, 4, 6, 8, 11, 12], [3, 5, 6, 7, 12], [3, 5, 6, 8, 9, 10, 12], [3, 5, 6, 8, 9, 12], [3, 5, 6, 8, 11, 12], [4, 6, 7, 12], [4, 6, 8, 9, 10, 12], [4, 6, 8, 9, 12], [4, 6, 8, 11, 12]]
>>> for path in paths: print(path)
... 
[3, 4, 6, 7, 12]
[3, 4, 6, 8, 9, 10, 12]
[3, 4, 6, 8, 9, 12]
[3, 4, 6, 8, 11, 12]
[3, 5, 6, 7, 12]
[3, 5, 6, 8, 9, 10, 12]
[3, 5, 6, 8, 9, 12]
[3, 5, 6, 8, 11, 12]
[4, 6, 7, 12]
[4, 6, 8, 9, 10, 12]
[4, 6, 8, 9, 12]
[4, 6, 8, 11, 12]
>>> nodes = [set(L) for L in paths]
>>> commons = functools.reduce(set.intersection, nodes)
>>> commons
{12, 6}
>>> min(commons, key=lambda v: max(L.index(v) for L in paths))
6

Now, note how 6 shows up at index 2 in some paths and at index 1 in some other paths. If there was a node (say x), that showed up at index 1 in the paths where 6 shows up at index 2, and at index 2 where 6 shows up at index 1, then, that would be a tie, which this algorithm would break arbitrarily. Depending on your needs, you might want to define how to handle this case better

like image 66
inspectorG4dget Avatar answered Oct 17 '22 16:10

inspectorG4dget