Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Functional topological sort

Tags:

haskell

In the topological sort algorithm, we perform DFS and push nodes onto a linked list as we encounter them. The only way I can think of to do this functionally would be do pass the list as a parameter into the calls, which is ugly and very inefficient (i.e., it shows up in the big O since copying the list is O(n). How would one do this idiomatically in Haskell?

like image 781
Elliot Gorokhovsky Avatar asked Jan 05 '23 11:01

Elliot Gorokhovsky


1 Answers

The fgl paper discusses this exact question as its first example of how to do graph algorithms in a functional style:

dfs :: [Node] -> Graph a b -> [Node]
dfs []     g        = []
dfs (v:vs) (c &v g) = v:dfs (suc c++vs) g
dfs (v:vs) g        = dfs vs g

[Editor's note: the notation c &v g is introduced earlier in the paper as a kind of "pattern match on graphs": c &v g matches graphs containing the vertex v, binding c to a context giving edges in and out of the vertex and g to the graph with that node and all its edges deleted. In the Haskell library, this is realized by the function match.]

The algorithm works as follows. If there are no nodes left to be visited (first case), dfs stops without returning any nodes. In contrast, if there are still nodes that must be visited, dfs tries to locate the context of the first of these nodes (v) in the argument graph. If this is possible (second equation), which is the case whenever v is contained in the argument graph, v is the next node on the resulting node list, and the search continues on the remaining graph g with the successors of v to be visited before the remaining list of nodes vs. The fact that the successors are put in front of all other nodes causes dfs to favor searching in the depth and not in the breadth. Finally, if v cannot be matched (last line), dfs continues the search with the remaining list of nodes vs. Note that the last case can only occur if v is not contained in the graph because otherwise the pattern in the second equation would have matched.

As copy-and-paste wasn't working properly, I had to transcribe this text; any spelling mistakes are almost certainly mine rather than Martin Erwig's.

As you suggest, the list of nodes left to visit is passed in as a parameter. However, your claim that this is inefficient seems off-target to me; the cost of constructing the new list suc c++vs is O(length (suc c)) -- but then you will have to visit all those nodes at least once to complete your depth-first search anyway, so that asymptotic cost is unavoidable.

The full paper linked above spends quite some time discussing asymptotics and efficiency, and is also (in my opinion) quite enlightening, so I encourage you to give it a read.

like image 143
Daniel Wagner Avatar answered Jan 10 '23 16:01

Daniel Wagner