Recently I needed to implement non-recursive DFS as part of a more complicated algorithm, Tarjan's algorithm to be precise. The recursive implementation is very elegant, but not suitable for large graphs. When I implemented the iterative version, I was shocked at how inelegant it finally ended up being, and I was wondering if I had done something wrong.
There's two basic approaches to iterative DFS. First, you can push all the children of a node at once onto the stack (seems by far more common). Or you can just push one. I will focus on the first one as that seems how everyone does it.
I had various problems with this algorithm, and eventually I realized that to do it efficiently, I needed not 1, not 2, but 3 boolean flags (I don't necessarily mean you need three explicit boolean variables, you might store the information indirectly via special values of variables that are usually integers, but you need to access those 3 pieces of information one way or another. The three flags were: 1) visited. This was to prevent children from being pushed onto the stack very redundantly. 2) Done. To prevent redundant processing of the same node. 3) Ascending/descending. To indicate whether the children had already been pushed onto the stack. The pseudocode looks something like this:
while(S)
if S.peek().done == True
S.pop()
continue
S.peek().visited = True
if S.peek().descending == True
S.peek().descending = False
for c in S.peek().children
if c.visited == False
S.push(c)
doDescendingStuff()
else
w = S.pop()
w.done = True
doAscendingStuff()
Some notes: 1) You don't need ascending/descending technically, as you could just see if the children are all done or not. But it's pretty inefficient in a dense graph.
2), The main kicker: The visited/done thing might not seem necessary. Here's why (I think) you need it. You can't mark things visited until you visit them on the stack. If you do, you can process things in the wrong order. E.g. suppose A is linked to B and C, B links to D, and D links to C. Then from A, you will push B and C on the stack. From B you push D on the stack... and then what? If you are marking things visited when you push them on the stack, you won't push C on the stack here. But this is wrong, C should be visited from D, not from A in this graph (assuming that A visits B before C). So, you don't mark things visited until you process them. But then, you will have C on the stack twice. So you need another flag to show you are completely done with it, so you don't process C a second time.
I don't see how to avoid all this to having a perfectly correct non-recursive DFS that supports actions both winding and unwinding. But instinctively it feels crufty. Is there a better way? Almost every place that I have consulted online really glosses over how to actually implement non-recursive DFS, saying that it can be done and providing a very basic algorithm. When the algorithm is correct (in terms of properly supporting multiple paths to the same node) which is rare, it rarely properly supports doing stuff both on winding and unwinding.
I think the most elegant stack-based implementation would have iterators of children on the stack, rather than nodes. Think of an iterator just as storing a node and a position in its children.
while (!S.empty)
Iterator i = S.pop()
bool found = false
Iterator temp = null
while (i.hasNext())
Node n = i.next()
if (n.visited == false)
n.visited = true
doDescendingStuff(n)
temp = n.getChildrenIterator()
break
if (!i.hasNext())
doAscendingStuff(i.getNode())
else
S.push(i)
if (temp != null)
S.push(temp)
The above could be optimised i.t.o storage space by separating the node and position onto 2 stacks.
Your code does not fully emulate what happens with the recursive DFS implementation. In the recursive DFS implementation, every node appears only once in the stack at any time.
The solution given by Dukeling is a way to do it iteratively. Basically, you have to push only one node at a time in the stack, instead of all at once.
Your assertion that this would need more storage is wrong: with your implementation, a node can be multiple times on a stack. In fact, if you start from a very dense graph (the complete graph on all vertices) this WILL happen. With Dukeling solution, the size of the stack is O(number of vertices). In your solution, it is O(number of edges).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With