Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to Render a Horizontal Binary-ish Tree in Text/ASCII form

It's a pretty normal binary tree, except for the fact that one of the nodes may be empty.

I'd like to find a way to output it in a horizontal way (that is, the root node is on the left and expands to the right).

I've had some experience expanding trees vertically (root node at the top, expanding downwards), but I'm not sure where to start, in this case.

Preferably, it would follow these couple of rules:

  • If a node has only one child, it can be skipped as redundant (an "end node", with no children, is always displayed)
  • All nodes of the same depth must be aligned vertically; all nodes must be to the right of all less-deep nodes and to the left of all deeper nodes.
  • Nodes have a string representation which includes their depth.
  • Each "end node" has its own unique line; that is, the number of lines is the number of end nodes in the tree, and when an end node is on a line, there may be nothing else on that line after that end node.
  • As a consequence of the last rule, the root node might be better off in either the top left or the bottom left corner; top left is preferred.

For example, this is a valid tree, with six end nodes (node is represented by a name, and its depth): EDIT: Please see bottom of question for an alternative, easier rendering

        
[a0]-----------[b3]------[c5]------[d8]
    \              \         \----------[e9]
     \              \----[f5]
      \-[g1]--------[h4]------[i6]
            \           \--------------------[j10]
             \-[k3]

Which represents the vertical, explicit binary tree:

0              a
              / \
1            g   *
            / \   \
2          *   *   *
          /     \   \
3        k       *   b
                /   / \
4              h   *   *
              / \   \   \
5            *   *   f   c
            /     \     / \
6          *       i   *   *
          /           /     \
7        *           *       *
        /           /         \
8      *           *           d
      /           /
9    *           e
    /
10 j

(branches folded for compactness; * representing redundant, one-child nodes; note that *'s are actual nodes, storing one child each, just with names omitted here for presentation sake)

(also, to clarify, I'd like to generate the first, horizontal tree; not this vertical tree)

I say language-agnostic because I'm just looking for an algorithm; I say ruby because I'm eventually going to have to implement it in ruby anyway.

Assume that each Node data structure stores only its id, a left node, and a right node.

A master Tree class keeps tracks of all nodes and has adequate algorithms to find:

  • A node's nth ancestor
  • A node's nth descendant
  • All end-node descendants of a node, and their count
  • The generation of a node
  • The lowest common ancestor of two given nodes

I already know:

  • The number of end nodes

Anyone have any ideas of where I could start? Should I go for the recursive approach? Iterative? Some Psuedo-code would be pretty cool too, and much appreciated =)


progress

As per walkytalky's suggestion, I decided to see what it would look like to map each "relevant" or significant node to a grid, with the columns being the depth and the rows identifiable by their end nodes. Here is what happens (skipping column 7 because there are no significant nodes in depth 7):

depth: 0  1  2  3  4  5  6  8  9  10
       a        b     c     d
                               e
                      f
          g        h     i
                                  j
                k

It should be easy enough to generate this grid, with either breadth-first or depth-first searches. Perhaps most trivially by simply keeping a 2D array and placing every significant node found into it, inserting a row for every "second child".

Now, knowing these facts:

  • The last node in a row must be an end node
  • Children are always to the right, and on the same row or lower, of their parent node.
  • All non-end nodes must have exactly two children
  • Therefore, all non-end nodes have children that are the first to the right of their column, the first child being on the same row, the second child being n rows below them, where n is the number of nodes on the right side of it.

We can see that, given any valid grid, there is one unambiguous way to "connect the dots", so to speak; there is one unambiguous tree being represented.

Now, the "connecting the dots" is no longer a binary-tree-structure question...it's simply a decoration question. We just need to build an algorithm to properly place the right -'s and \'s where they can go, perhaps following only simple grid/lexicographical rules, instead of binary-tree-structure rules.

Basically, this means that the problem of rendering a tree is now the much simpler problem of rendering a grid, with fancy decorations.

Can anyone suggest any way of formulating these rules? Or maybe a completely different method altogether?


edit

I have conceived of a much, much easier final rendering:

--d0----d1----d3----d4----d5----d6----d8----d9----d10-- => guide line (not rendered)

 [a0 ]-------[b3 ]-------[c5 ]-------[d8 ]
   |           |           \---------------[e9 ]
   |           \---------[f5 ]
   \---[g1 ]-------[h4 ]-------[i6 ]
         |           \---------------------------[j10]
         \---[k3 ]

--d0----d1----d3----d4----d5----d6----d8----d9----d10-- => guide line (not rendered)

It might be easier to try to create this one, instead of the one I had posted earlier. For one, it preserves a pretty grid shape, and you don't have to fickle with diagonal lines. The rows are all mapped along clearly visible column lines. Unfortunately, it is nowhere near as pretty as the first.

like image 273
Justin L. Avatar asked Jun 16 '10 20:06

Justin L.


2 Answers

If there are N end nodes, there must be N-1 internal nodes with 2 children. (There can be any number of internal nodes with 1 child, which we will have to count to get the depths but otherwise ignore.) Generating the tree is thus equivalent to positioning these nodes on a grid, where:

  • the number of rows in the grid is N
  • I think the number of columns is between 1+floor(log2(N)) and 2*N-1, depending on how much overlap there is; this probably doesn't matter much for our purposes, though
  • each endpoint appears on a different row
  • all nodes at the same depth appear in the same column
  • all internal nodes appear on the same row as their rightmost descendant endpoint

So, let's see:

  • Walk the tree depth-first, right-to-left.
  • For each endpoint, record its depth and label.
  • For each 2-child internal, record its depth, label and the indices of both rightmost and leftmost child endpoints.
  • Sort the whole lot by depth -- this gives you the column ordering, with the number of distinct depths giving the actual number of columns. (All other ordering should come out automatically from the walk, I think, but that's not the case here because any branch can be any depth.)
  • Place all the nodes in the grid.
  • Mark empty cells to the right of each non-endpoint node as horizontal branches.
  • Mark empty cells down from each internal node to the row above its left child as vertical branches, and the cell at the level of the left child as a junction.

  • Print with appropriate ASCII decoration.

Update:

As you say, the positioning is enough to unambiguously determine the connections, but you still need to do some bottom-up work to get that right, so I'd probably still do the "mark" steps during the grid building.

I sort of thought the printing was trivial enough to gloss over, but:

  • Iterate down each column and determine the column width as size of fixed elements + max label length + floor(log10(depth) + 1). (Fixed elements might be [ and ]-, for example. We can substitute ]\n as the suffix for endpoints.)
  • For each row
    • for each column
      • if cell contains a node or endpoint
        • print fixed prefix
        • print label
        • print depth
        • print fill spaces (max label length - current label length)
        • print appropriate suffix
        • if node is an endpoint, skip to next row
      • if cell is empty, print fill spaces to width of column
      • if cell contains a vertical, print some chosen prefix number of spaces, a bar, and fill with spaces
      • if cell contains a junction, print some chosen prefix number of spaces, a backslash, and fill with hyphens
      • if cell contains a horizontal, print full column width of hyphens

Converting this to print diagonals might be easiest if you generate the straight version first and then do some substitutions in the character array -- otherwise you can get cases where you're rendering a long vertical branch in a different column than the one in which it originated.

At some point I may try to put this into code, but it probably won't be today -- stuff to do!

like image 66
walkytalky Avatar answered Nov 05 '22 05:11

walkytalky


Looks like an interesting problem; I'd be happy to give it a try, if I had more time.

I'd probably go with the following approach :

  1. Start rendering "right" (or in your case, "top") nodes, until I reach the end. (i.e.: render a, b, c, and d)
  2. Go back to the last node with a child (i.e.: c), and do the same thing recursively

You would have to keep a global variable indicating on wich row you are printing. Each recursive call increases this variable.

edit: ok, couldn't resist trying to write some untested pseudo-code, hope it works:

function print_tree(Node n) {
    print "\n" // begin on a fresh new line
    childs = new Array();
    do {
        if (n.hasLeftChild) {
            childs.push(n.leftChild)
        }
        print "---" + n.id    //this needs a lot of tweaking, but you get the idea
    } while(n = n.rightChild)
    childs.reverse()
    foreach(child in childs) {
        print_tree(child);
    }
}
like image 35
gregschlom Avatar answered Nov 05 '22 04:11

gregschlom