There's a famous algorithm to find the strongly connected components called Kosaraju's algorithm
, which uses two DFS's to solve this problem, and runs in θ(|V| + |E|)
time.
First we use DFS on complement of the graph (G
R) to compute reverse postorder of vertices, and then we apply second DFS on the main graph G
by taking vertices in reverse post order to compute the strongly connected components.
Although I understand the mechanics of the algorithm, I'm not getting the intuition behind the need of the reverse post order.
How does it helps the second DFS to find the strongly connected components?
suppose result of the first DFS is:
----------v1--------------v2-----------
where "-" indicates any number and all the vertices in a strongly connected component g appear between v1 and v2.
DFS by post order gives the following guarantee that
in one word, the first DFS ensures that in the second DFS, strongly connected components that are visited earlier cannot have any edge points to other unvisited strongly connected components.
let's simplify the graph as follow:
the situation in which this algorithm could fail can be conclude as
origin graph would be like g-->v
, and the reversed graph looks like g'<--v
.
To start the second DFS from v, the post order generated by first DFS need to be something like
g1, g2, g3, ..., v
but you would easily find out that neither starting the first DFS from v nor from g' can give you such a post order, so in this situation, it is guaranteed be the first DFS that the second DFS would not start from a vertex that both be out of and points to a strongly connected component.
similar to the situation 1, in situation 2, where the origin graph is g<--v
and the reversed on is g'-->v
, it is guaranteed that v would be visited before any vertex in g'.
When you run DFS on a graph for the first time, for every node you visit you get the knowledge about all nodes that are reachable from that node (you get this information after the first DFS is finished).
Then, when you inverse all the vertices and run the DFS once more, for every node you visit you get the knowledge about all nodes that can reach that node in the non-inverted graph (again, you get this info after the second DFS finished).
Example: let's say your first DFS reaches node X. From that node "you can see" all the neighbours you can visit. (I hope this is pretty understandable). Then, let's say your second DFS reaches that node X, but this time all the vertices are inverted. If then from your node X "you can see" any other nodes, it means that before inverting the vertices the node X was reachable from all the neighbours you see now. By calling the second DFS in the correct order you get for every node X all the nodes that where reachable from X in both DFS trees (and so, for every node X you get the nodes that were both reachable from X and could reach X - those are strongly connected components by definition).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With