Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tarjan's strongly connected components algorithm in python not working

I implemented the Tarjan's strongly connected components algorithm, according to wikipedia, in Python, but it isn't working. The algorithm is quite short and I cannot find any difference, so I cannot tell why it isn't working. I tried to check the original paper, but could not find it.

Here is the code.

def strongConnect(v):
  global E, idx, CCs, c, S
  idx[v] = (c, c) #idx[v][0] for v.index # idx[v][1] for v.lowlink
  c += 1
  S.append(v)  
  for w in [u for (v2, u) in E if v == v2]:
    if idx[w][0] < 0:
      strongConnect(w)
      # idx[w] = (idx[w][0], min(idx[v][1], idx[w][1])) #fixed, thx
      idx[v] = (idx[v][0], min(idx[v][1], idx[w][1]))
    elif w in S:
      idx[v] = (idx[v][0], min(idx[v][1], idx[w][0]))
  if (idx[v][0] == idx[v][1]):
    i = S.index(v)
    CCs.append(S[i:])
    S = S[:i]

E = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'), ('E', 'A'), ('A', 'E'), ('C', 'A'), ('C', 'E'), ('D', 'F'), ('F', 'B'), ('E', 'F')]
idx = {}
CCs = []
c = 0
S = []
for (u, v) in E:
  idx[u] = (-1, -1)
  idx[v] = (-1, -1)
for v in idx.keys():
  if idx[v][0] < 0:
    strongConnect(v)

print(CCs)

You can check the graph visually if you prefer. As you can see this is a quite forward translation of the pseudocode in wikipedia. However, this is the output:

[['D', 'E', 'F'], ['B', 'C'], ['A']]

There should be only one strongly connected component, not three. I hope the question is right in all its aspects, if not I'm sorry. In any case, thank you very much.

like image 835
jmora Avatar asked Jul 04 '11 18:07

jmora


People also ask

What does Tarjan's algorithm return?

Tarjan's Algorithm is an efficient graph algorithm to find the strongly connected components in a directed graph in linear time by utilizing Depth First Search traversal of a graph. The key idea used is that nodes of strongly connected component form a subtree in the DFS spanning tree of the graph.

Which is better kosaraju or Tarjan?

175iq's blog. By what factor is Kosaraju's algorithm for finding strongly connected component slower as compared to Tarjan's algorithm. It appears to me that the factor should be 3 as there are 2 dfs passes in Kosaraju's algorithm and we also have to transpose the graph once.

Why does kosaraju algorithm work?

The transpose graph property is why Kosaraju's algorithm works successfully to search SCCs.


1 Answers

Ok, I had some more time to think about this. I'm no longer certain that filtering the edges was the problem, as I previously stated. In fact, I think there's an ambiguity in the pseudocode; does for each (v, w) in E mean for each edge (as the literal meaning of for each suggests), or only each edge beginning with v, (as you reasonably assumed)? Then, after the for loop, is the v in question the final v from the for loop, as it would be in Python? Or does that go back to being the original v? Pseudocode doesn't have clearly defined scoping behavior in this case! (It would be really weird if the v at the end were to be the last, arbitrary, value of v from the loop. That suggests that filtering is correct, because in that case, v means the same thing all the way through.)

However, under any circumstances, the clear error in your code is here:

  idx[w] = (idx[w][0], min(idx[v][1], idx[w][1]))

According to the pseudocode, that should definitely be

  idx[v] = (idx[v][0], min(idx[v][1], idx[w][1]))

Once you make that change, you get the expected result. Frankly it doesn't surprise me that you made that mistake, because you're using a really weird and counterintuitive data structure. Here's what I think is an improvement -- it adds only a few more lines, and I find it to be much more readable.

import itertools

def strong_connect(vertex):
    global edges, indices, lowlinks, connected_components, index, stack
    indices[vertex] = index
    lowlinks[vertex] = index
    index += 1
    stack.append(vertex)

    for v, w in (e for e in edges if e[0] == vertex):
        if indices[w] < 0:
            strong_connect(w)
            lowlinks[v] = min(lowlinks[v], lowlinks[w])
        elif w in stack:
            lowlinks[v] = min(lowlinks[v], indices[w])

    if indices[vertex] == lowlinks[vertex]:
        connected_components.append([])
        while stack[-1] != vertex:
            connected_components[-1].append(stack.pop())
        connected_components[-1].append(stack.pop())

edges = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'), 
         ('E', 'A'), ('A', 'E'), ('C', 'A'), ('C', 'E'), 
         ('D', 'F'), ('F', 'B'), ('E', 'F')]
vertices = set(v for v in itertools.chain(*edges))
indices = dict((v, -1) for v in vertices)
lowlinks = indices.copy()
connected_components = []

index = 0
stack = []
for v in vertices:
    if indices[v] < 0:
        strong_connect(v)

print(connected_components)

However, I find the use of global variables here distasteful. You could hide this away in its own module, but I prefer the idea of creating a callable class. After looking more closely at Tarjan's original pseudocode, (which confirms that the "filtered" version is correct, by the way), I wrote this. It includes a simple Graph class and does couple of basic tests:

from itertools import chain
from collections import defaultdict

class Graph(object):
    def __init__(self, edges, vertices=()):
        edges = list(list(x) for x in edges)
        self.edges = edges
        self.vertices = set(chain(*edges)).union(vertices)
        self.tails = defaultdict(list)
        for head, tail in self.edges:
            self.tails[head].append(tail)

    @classmethod
    def from_dict(cls, edge_dict):
        return cls((k, v) for k, vs in edge_dict.iteritems() for v in vs)

class _StrongCC(object):
    def strong_connect(self, head):
        lowlink, count, stack = self.lowlink, self.count, self.stack
        lowlink[head] = count[head] = self.counter = self.counter + 1
        stack.append(head)

        for tail in self.graph.tails[head]:
            if tail not in count:
                self.strong_connect(tail)
                lowlink[head] = min(lowlink[head], lowlink[tail])
            elif count[tail] < count[head]:
                if tail in self.stack:
                    lowlink[head] = min(lowlink[head], count[tail])

        if lowlink[head] == count[head]:
            component = []
            while stack and count[stack[-1]] >= count[head]:
                component.append(stack.pop())
            self.connected_components.append(component)

    def __call__(self, graph):
        self.graph = graph
        self.counter = 0
        self.count = dict()
        self.lowlink = dict()
        self.stack = []
        self.connected_components = []

        for v in self.graph.vertices:
            if v not in self.count:
                self.strong_connect(v)

        return self.connected_components

strongly_connected_components = _StrongCC()

if __name__ == '__main__':
    edges = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'),
             ('E', 'A'), ('A', 'E'), ('C', 'A'), ('C', 'E'),
             ('D', 'F'), ('F', 'B'), ('E', 'F')]
    print strongly_connected_components(Graph(edges))
    edge_dict = {'a':['b', 'c', 'd'],
                 'b':['c', 'a'],
                 'c':['d', 'e'],
                 'd':['e'],
                 'e':['c']}
    print strongly_connected_components(Graph.from_dict(edge_dict))
like image 161
senderle Avatar answered Sep 22 '22 12:09

senderle