Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Total number of paths in directed acyclic graph containing a specific link

I've been trying to code up an algorithm that takes a directed set of nodes (I have expressed as a sparse directed adjacency matrix, for now) say, A, B, C, and D, that, when called, gives me all possible paths that contain a given path (such as AB or AD). A node cannot connect to itself, and all nodes are directed to flow from A to D, eventually.

I've made an attempt at coding a recursive python script with limited success so far -- my graph theory isn't strong (I actually have zero background). Any help anyone could provide as far as the direction I should go would be appreciated.

As a disclaimer- this isn't homework (I'm just trying process a large data set and build a personal library for some research), and I've looked at "similar questions" for several hours, but largely to no avail (except for the aforementioned recursive python script).

Thanks.

like image 973
user2388189 Avatar asked Jan 18 '26 23:01

user2388189


2 Answers

Start by solving the simpler problem: given two points A and B in a DAG can you count all the paths that start with A and end with B? (A "path" is defined as a list of edges where the end node of one is equal to the start node of the next.)

Sounds hard. Can we simplify it?

Well clearly the most trivial case is where A and B are actually the same node. In that case there are zero paths because the graph is acyclic.

Suppose then that A and B are different. WOLOG suppose A has exactly two neighbours C and D, neither of which are B. The number of paths from A to B must be equal to the number of paths from C to B, plus the number of paths from D to B.

More generally: if A has n neighbours none of which are B then find the number of paths from each neighbour to B and sum them.

If A does have neighbour B then don't forget to add one to the total for the path A->B.

Now we've split this up into many sub-problems, and each is strictly smaller than the previous problem.

You'd think then that a straightforward recursive solution would do the trick, but unfortunately it does not. Consider this graph:

                    A
                   / \
                  C   D
                   \ /
                    E
                   / \
                  F   G
                   \ /
                    H
                   / \
                  I   J
                   \ /
                    B
  • Paths from A to B equals paths from C to B plus paths from D to B. So calculate those.
  • Paths from C to B equals paths from E to B.
  • Paths from D to B equals paths from E to B.

And... we just calculated paths from E to B twice. Which will in turn calculate paths from H to B twice, for a total of four times. The algorithm that I have sketched can end up calculating the same things 2n times where n is proportional to the number of nodes in the graph!

What you need to do is make a memoizer, so that once the answer is calculated once, it is never calculated again.

So, solve the simpler problem by making a recursive, memoized algorithm that computes the total number of paths between two given nodes.

So let's try it out. How many paths are there from A to B? Let's denote that ab. We calculate:

ab = cb + db, but we don't know them...
  cb = eb, but we don't know it...
    eb = fb + gb, but we don't know them...
      fb = hb, but we don't know it...
        hb = ib + jb, but we don't know them...
          ib = 1
          jb = 1
        therefore hb = 2
      therefore fb = 2
    gb = hb, but we already know that is 2
    therefore eb = 4
  therefore cb = 4
  db = eb, but we already know that is 4
therefore ab is 8

And we're done.

Once you can find the number of paths between two nodes then it is straightforward to calculate all paths that contain a given edge. Paths from A to B that pass through E-G for example, is equal to number of paths from A to E multiplied by number of paths from G to B.

Let's try it.

ae = ce + de
  ce = 1
  de = 1
so ae = 2

gb = hb
  hb = ib + jb
    ib = 1
    jb = 1
  so hb = 2
so gb = 2

and there are ae * gb = 4 paths from A to B that go through EG. Let's check our work. The paths are

AC-CE-EG-GH-HI-IB
AC-CE-EG-GH-HI-JB
AD-DE-EG-GH-HI-IB
AD-DE-EG-GH-HI-JB

Yep, there are four.

Make sense?

like image 109
Eric Lippert Avatar answered Jan 20 '26 12:01

Eric Lippert


If you're willing to use an external library, look into networkx and its function all_simple_paths.

Suppose you had a DAG with source s and sink t. To find all paths containing the edge a -> b, find all paths from s to a, and all paths from b to t, and take the cartesian product of the two sets of paths.

like image 41
univerio Avatar answered Jan 20 '26 11:01

univerio



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!