More efficiently compute transitive closures of each dependents while incrementally building the directed graph

Tags:

I need to answer the question: given a node in a dependency graph, group its dependents by their own transitive dependents which would be impacted by a particular start node.

In other words, given a node in a dependency graph, find the set of sets of direct dependents which transitively have common dependents that derive from that particular starting node.

e.g. given the pseudo code:

let a = 1
let b = 2
let c = a + b
let d = a + b
let e = a
let f = a + e
let g = c + d

You could compute this graph:

Graph diagram

If we used a as the start node we can see that of the dependents of a, both c and d have a dependent of g. And f has a dependent of e and a.

Notice that a has no impact on b at all, so it should not be taken into account when deciding how to group the dependents of a.

Using a as the start node, we'd want to get this grouped sets of dependents:

groups = {{c, d}, {e, f}}

c and d have direct or transitive downstream relations, and e and f together as well. But, for example, e and f have no dependent (downstream) relation at all with c or d either directly or indirectly (transitively). And b doesn't derive from a directly or indirectly, so it should not have any impact on deciding our grouping.

Also keep in mind that this graph is small for simplicity. It could be that transitive dependents happen much further down the subgraph than this example happens to have.

I did a bunch of paper research and indeed there are numerous solutions, however they don't have the performance characteristics I'm looking for. The graph is incrementally created over time, and at each stage I need to be able to answer this question so traversing the entire graph each time is a dealbreaker.

I think I have a major advantage that isn't referenced in the various approaches I could find: I have full control over the creation of the graph and the dependents are added in reverse topological order, so the graph is correctly sorted. With that in mind, I considered the obvious solution of computing the answer incrementally (dynamic programming).

I figured a bitmask would be a fast way to store and look up what dependents a given node has. When a dependent is added to a node, I would update that node's mask to include the bits of that dependent (which itself includes its dependents and so on)

let maskCounter = 0;

class Node {
  constructor(name) {
    this.name = name;
    this.dependents = [];
    this.mask = 1 << maskCounter;
    maskCounter++;
  }

  addDependent(dependent) {
    // Now our mask contains the bits representing all of
    // its direct and transitive dependents
    this.mask = this.mask | dependent.mask;

    // Need to see if this dependent has a transitive
    // dependent of its own that exists in one of the groups
    for (const group of this.dependents) {
      const result = group.mask & dependent.mask;

      if (result) {
        group.mask |= dependent.mask;
        group.values.push(dependent);
        return;
      }
    }

    // If reached, this dependent has no transitive dependents
    // of its own with any of this node's other dependents.
    // That's confusing, huh?
    this.dependents.push({
      mask: dependent.mask,
      values: [dependent]
    });
  }
}

The dependents would, however, need to be added in reverse order up the graph so that the graph sorted correctly and the top of the graph contains the masks of all its dependents.

const a = new Node('a');
const b = new Node('b');
const c = new Node('c');
const d = new Node('d');
const e = new Node('e');
const f = new Node('f');
const g = new Node('g');

b.addDependent(c);
b.addDependent(d);
c.addDependent(g);
d.addDependent(g);
e.addDependent(f);
a.addDependent(c);
a.addDependent(d);
a.addDependent(e);
a.addDependent(f);

The bitmasks would incrementally look like this:

b = b 00000010 | c 00000100
b = b 00000110 | d 00001000
c = c 00000100 | g 01000000
d = d 00001000 | g 01000000
e = e 00010000 | f 00100000
a = a 00000001 | c 01000100
a = a 01000101 | d 01001000
a = a 01001101 | e 00110000
a = a 01111101 | f 00100000
===========================
a = 01111101

At the end a has a mask of 01111101, each bit represents each of its downstream transitive dependents. Notice that the second to last bit is not flipped, that's the bit for b which does not depend on a at all.

If we look at the resulting value of a.dependents we see:

[
  { values: [c, d], mask: 0b00110000 },
  { values: [e, f], mask: 0b01001100 }
]

which provides the answer we're looking for, ultimately a set of sets. a.dependents.map(group => group.values)--this an array aka list but it's being used as a set for simplicity.

Here's a JSBin: https://jsbin.com/jexofip/edit?js,console

This works, and CPU-wise is acceptable because I'll very frequently need to know the grouped dependents but dependents change far less often.

The example above uses JavaScript for simplicity of demo, which uses 32-bit signed integers for bitwise operations so we can only create 31 unique nodes. We could use an arbitrary precision integer (e.g. BigInt) to create a "unlimited" number of nodes, but the problem is memory usage.

Because each node needs their own unique bit flipped, I think the memory usage is:

N * (N + 1) / 2 = bits      (where N = number of nodes)

e.g. 10,000 nodes is about 6.25 megabytes!

That's excluding any platform overhead for using arbitrary precision integers (or a similar custom data structure).

In my use case, it would be common to have 10k+ and in fact 100k+ is probable in some cases (625 MB !!!) and it's also possible for new nodes to be created indefinitely, using an infinite amount of memory, so this solution isn't practical because there is no easy way to "garbage collect" no longer used mask bits from nodes that drop off the graph--it's of course possible, but it's the traditional GC problem I'd like to avoid if possible.

Side note: depending on the size and depth of the graph, this might not actually perform well either. Even though bitwise operations themselves are relatively fast, doing it on on a BigInt of 100,000 bits for each node at the top of the graph is not. So completely rethinking my approach is also welcome.

Ultimately trading memory for CPU is the usual give and take, but I'm wondering if perhaps I'm missing another approach that strikes a better balance or requires significantly less memory?

There may be other unique considerations I haven't thought of that could be used.

School me!

798

asked Nov 17 '18 04:11

jayphelps

1 Answers

The relation you want to group by is not an equivalence relation. For example, consider this dependency graph:

a->bcd, bc->e, cd->f

Here, b and c have a common dependent, and so do c and d, but there are no common dependent between b and d. In this case, you probably want to have b, c and d in the same group. However, it gets trickier with this case:

a->bd, bc->e, cd->f

Here, a doesn't depend on c, so you may want to have b and d in separate groups, now that you don't need to care about c. However, there is a class of algorithms which will group b and d together in this case: algorithms which maintain grouping of all nodes, and use this as a basis for grouping new node's direct descendants.

One such algorithm uses a disjoint-set structure to efficiently track which nodes are connected. In my example, before processing a, the algorithm will have nodes b, c, d, e, and f all in the same set, so they will be grouped together.

Here is an implementation:

function find(node) {
  return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
  a = find(a);
  b = find(b);
  if (a.rank < b.rank) {
    a.parent = b;
  } else {
    b.parent = a;
    if (a.rank == b.rank) {
      ++a.rank;
    }
  }
}

class Node {
  constructor(name, dependents) {
    this.name = name;
    this.parent = null;
    this.rank = 0;
    let depMap = new Map();
    for (let d of dependents) {
      let dset = find(d);
      if (!depMap.has(dset)) {
        depMap.set(dset, []);
      }
      depMap.get(dset).push(d);
    }
    output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + '\n';
    for (let d of depMap.keys()) {
    // or: for (let d of dependents) {
      merge(this, d);
    }
  }

  toString() {
    return this.name;
  }
}

let output = '';
const f = new Node('f', []);
const e = new Node('e', [f]);
const d = new Node('d', []);
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>

171

answered Sep 19 '22 23:09

abacabadabacaba

Related questions
                            
                                Longest path in a particular type of graph
                            
                                How can we find the largest contiguous region of a graph with two different "ID"s?
                            
                                Javascript directed acyclic graph library? (Graph visualization is NOT necessary)
                            
                                Is there any implementation of bidirectional search for Dijkstra algorithm? [closed]
                            
                                Count number of pairs of nodes in undirected graph such that W - L >= K
                            
                                Finding all shortest paths from every pair of nodes on a graph
                            
                                Does A* work with negative weights as long that the heuristic is admissible?
                            
                                Approximation algorithm for TSP variant, fixed start and end anywhere but starting point + multiple visits at each vertex ALLOWED
                            
                                Saving graphs in Haskell
                            
                                Efficient way to practice graph theory algorithms
                            
                                Drawing a Web Graph [closed]
                            
                                All possible maximum matchings of a bipartite graph
                            
                                Are there faster algorithms than Dijkstra?
                            
                                Implementation of an adjacency list graph representation
                            
                                small cycle finding in a planar graph
                            
                                How to find all shortest paths
                            
                                How can I order a list of connections
                            
                                Constrained short path algorithm in networkx?
                            
                                algorithm to enumerate all possible paths

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

More efficiently compute transitive closures of each dependents while incrementally building the directed graph

Tags:

transitive-dependency

graph-theory

directed-graph

dataflow

transitive-closure

jayphelps

People also ask

1 Answers

abacabadabacaba

Recent Activity

Donate For Us