One of my side projects is a decompiler that turns native code into LLVM IR, simplifies it and outputs pseudo-C. An essential phase of the program's process is pattern-independent control flow structuring, which finds regions in a program and turns them into structured control flow statements (i.e. not gotos).
I had to roll out my own code for finding regions because LLVM's regions don't look exactly how the paper expects them. However, finding regions requires you to have a post-dominator tree, and it turns out that LLVM's post-dominator tree building algorithm can't make one for functions that have no end block, like functions that "end" with an endless loop.
This appears to be because the tree-building algorithm needs a starting point. Normally, the starting point is the function's returning block, since it post-dominates every other block; but there isn't any in functions that spin into an endless loop, since they don't have any return
or unreachable
terminator. (At this point, it might be worth noting that LLVM's region code also relies on a post-dominator tree and is also useless for functions for which it can't be built.)
It seems to me that even though this algorithm fails, the fact that a function doesn't return doesn't mean that you can't make a post-dominator tree for it.1 In fact, if that endless loop has a single back-edge (which is something that I can ensure), the node with that back edge necessarily post-dominates every other node in the graph, so it should be possible to make a post-dominator tree.
If I could find that node, I could probably feed it to LLVM's post-dom infrastructure and get a reasonable post-dominator tree out of it. Unfortunately, I'm not very imaginative and the only straightforward way that I can think of to identify that crucial node is that "it's the one that post-dominates everything", which certainly won't help me bootstrap a post-dominator tree.
Finding back edges isn't particularly hard. As Doug Currie says, you can do it with a simple DFS, and in fact, another part of my project does exactly that. However, in the case of a function with an endless loop and nested, terminating loops, I don't know how I would tell the inner back edge from the outer back edge without domination information. (If it can help, at this stage of the process, every loop is guaranteed to have a single entry node and at most one exit node.)
So how can I build the post-dominator tree of a function that doesn't have any returning basic block?
1. My compiler and graph theory background is entirely self-taught. This might not be accurate.
The dominance frontier of a node d is the set of all nodes ni such that d dominates an immediate predecessor of ni, but d does not strictly dominate ni. It is the set of nodes where d's dominance stops. A dominator tree is a tree where each node's children are those nodes it immediately dominates.
By definition, the dominator is the largest number of the right subsequence.
Strictly speaking, when there's an infinite loop (not just an unbounded loop -- a strictly infinite one with no exit), there is no post-dominator, as for every node in the loop, the next node in the loop will execute after it, so there's no "last" node to the loop.
One way of dealing with it, is to do the normal post-dominator finding, except after doing the initial depth-first traversal from the exit, you check for unvisited nodes. If there are any, the exit is unreachable from them (no path from the nodes to the exit), so you pick one at random and add it as a psuedo-exit (as if the node contained a conditional branch to the exit, just the condition is always false) and restart. This gives you a post-dominator tree, but not necessarily a unique one (since you might choose different nodes at random to add the exit branch), but generally that doesn't matter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With