Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding recursive tree-creation with while loop + stacks

Tags:

scala

I'm a bit embarassed to admit this, but I seem to be pretty stumped by what should be a simple programming problem. I'm building a decision tree implementation, and have been using recursion to take a list of labeled samples, recursively split the list in half, and turn it into a tree.

Unfortunately, with deep trees I run into stack overflow errors (ha!), so my first thought was to use continuations to turn it into tail recursion. Unfortunately Scala doesn't support that kind of TCO, so the only solution is to use a trampoline. A trampoline seems kinda inefficient and I was hoping there would be some simple stack-based imperative solution to this problem, but I'm having a lot of trouble finding it.

The recursive version looks sort of like (simplified):

private def trainTree(samples: Seq[Sample], usedFeatures: Set[Int]): DTree = {
  if (shouldStop(samples)) {
    DTLeaf(makeProportions(samples))
  } else {
    val featureIdx = getSplittingFeature(samples, usedFeatures)
    val (statsWithFeature, statsWithoutFeature) = samples.partition(hasFeature(featureIdx, _))
    DTBranch(
      trainTree(statsWithFeature, usedFeatures + featureIdx), 
      trainTree(statsWithoutFeature, usedFeatures + featureIdx),
      featureIdx)
  }
}

So basically I'm recursively subdividing the list into two according to some feature of the data, and passing through a list of used features so I don't repeat - that's all handled in the "getSplittingFeature" function so we can ignore it. The code is really simple! Still, I'm having trouble figuring out a stack-based solution that doesn't just use closures and effectively become a trampoline. I know we'll at least have to keep around little "frames" of arguments in the stack but I would like to avoid closure calls.

I get that I should be writing out explicitly what the callstack and program counter handle for me implicitly in the recursive solution, but I'm having trouble doing that without continuations. At this point it's hardly even about efficiency, I'm just curious. So please, no need to remind me that premature optimization is the root of all evil and the trampoline-based solution will probably work just fine. I know it probably will - this is basically a puzzle for it's own sake.

Can anyone tell me what the canonical while-loop-and-stack-based solution to this sort of thing is?

UPDATE: Based on Thipor Kong's excellent solution, I've coded up a while-loops/stacks/hashtable based implementation of the algorithm that should be a direct translation of the recursive version. This is exactly what I was looking for:

FINAL UPDATE: I've used sequential integer indices, as well as putting everything back into arrays instead of maps for performance, added maxDepth support, and finally have a solution with the same performance as the recursive version (not sure about memory usage but I would guess less):

private def trainTreeNoMaxDepth(startingSamples: Seq[Sample], startingMaxDepth: Int): DTree = {
  // Use arraybuffer as dense mutable int-indexed map - no IndexOutOfBoundsException, just expand to fit
  type DenseIntMap[T] = ArrayBuffer[T]
  def updateIntMap[@specialized T](ab: DenseIntMap[T], idx: Int, item: T, dfault: T = null.asInstanceOf[T]) = {
    if (ab.length <= idx) {ab.insertAll(ab.length, Iterable.fill(idx - ab.length + 1)(dfault)) }
    ab.update(idx, item)
  }
  var currentChildId = 0 // get childIdx or create one if it's not there already
  def child(childMap: DenseIntMap[Int], heapIdx: Int) =
    if (childMap.length > heapIdx && childMap(heapIdx) != -1) childMap(heapIdx)
    else {currentChildId += 1; updateIntMap(childMap, heapIdx, currentChildId, -1); currentChildId }
  // go down
  val leftChildren, rightChildren = new DenseIntMap[Int]() // heapIdx -> childHeapIdx
  val todo = Stack((startingSamples, Set.empty[Int], startingMaxDepth, 0)) // samples, usedFeatures, maxDepth, heapIdx
  val branches = new Stack[(Int, Int)]() // heapIdx, featureIdx
  val nodes = new DenseIntMap[DTree]() // heapIdx -> node
  while (!todo.isEmpty) {
    val (samples, usedFeatures, maxDepth, heapIdx) = todo.pop()
    if (shouldStop(samples) || maxDepth == 0) {
      updateIntMap(nodes, heapIdx, DTLeaf(makeProportions(samples)))
    } else {
      val featureIdx = getSplittingFeature(samples, usedFeatures)
      val (statsWithFeature, statsWithoutFeature) = samples.partition(hasFeature(featureIdx, _))
      todo.push((statsWithFeature, usedFeatures + featureIdx, maxDepth - 1, child(leftChildren, heapIdx)))
      todo.push((statsWithoutFeature, usedFeatures + featureIdx, maxDepth - 1, child(rightChildren, heapIdx)))
      branches.push((heapIdx, featureIdx))
    }
  }
  // go up
  while (!branches.isEmpty) {
    val (heapIdx, featureIdx) = branches.pop()
    updateIntMap(nodes, heapIdx, DTBranch(nodes(child(leftChildren, heapIdx)), nodes(child(rightChildren, heapIdx)), featureIdx))
  }
  nodes(0)
}
like image 473
lvilnis Avatar asked May 21 '12 01:05

lvilnis


People also ask

How do you make a recursive tree?

Steps to solve recurrence relation using recursion tree method: Draw a recursive tree for given recurrence relation. Calculate the cost at each level and count the total no of levels in the recursion tree. Count the total number of nodes in the last level and calculate the cost of the last level.

How do you approach a recursive tree problem?

Coding the first approach To start a recursive solution, we have to consider base cases. If the nodes of both trees are null at the same point, then we can return true . If the node of one tree is null , but the other tree is not null , then we know the trees are unequal, so we can return false.

Can you iterate through a tree?

Traversing a tree involves iterating over all nodes in some manner. Because from a given node there is more than one possible next node (it is not a linear data structure), then, assuming sequential computation (not parallel), some nodes must be deferred—stored in some way for later visiting.

How do you recurse a binary tree?

Recursive preorder traversal of binary tree In recursive preorder traversal, we first process the root node, then process all nodes in the left subtree, and finally, we process all nodes in the right subtree.


1 Answers

Just store the binary tree in an array, as described on Wikipedia: For node i, the left child goes into 2*i+1 and the right child in to 2*i+2. When doing "down", you keep a collection of todos, that still have to be splitted to reach a leaf. Once you've got only leafs, to go upward (from right to left in the array) to build the decision nodes:

Update: A cleaned up version, that also supports the features stored int the branches (type parameter B) and that is more functional/fully pure and that supports sparse trees with a map as suggested by ron.

Update2-3: Make economical use of name space for node ids and abstract over type of ids to allow of large trees. Take node ids from Stream.

sealed trait DTree[A, B]
case class DTLeaf[A, B](a: A, b: B) extends DTree[A, B]
case class DTBranch[A, B](left: DTree[A, B], right: DTree[A, B], b: B) extends DTree[A, B]

def mktree[A, B, Id](a: A, b: B, split: (A, B) => Option[(A, A, B)], ids: Stream[Id]) = {
  @tailrec
  def goDown(todo: Seq[(A, B, Id)], branches: Seq[(Id, B, Id, Id)], leafs: Map[Id, DTree[A, B]], ids: Stream[Id]): (Seq[(Id, B, Id, Id)], Map[Id, DTree[A, B]]) =
    todo match {
      case Nil => (branches, leafs)
      case (a, b, id) :: rest =>
        split(a, b) match {
          case None =>
            goDown(rest, branches, leafs + (id -> DTLeaf(a, b)), ids)
          case Some((left, right, b2)) =>
            val leftId #:: rightId #:: idRest = ids
            goDown((right, b2, rightId) +: (left, b2, leftId) +: rest, (id, b2, leftId, rightId) +: branches, leafs, idRest)
        }
    }

  @tailrec
  def goUp[A, B](branches: Seq[(Id, B, Id, Id)], nodes: Map[Id, DTree[A, B]]): Map[Id, DTree[A, B]] =
    branches match {
      case Nil => nodes
      case (id, b, leftId, rightId) :: rest =>
        goUp(rest, nodes + (id -> DTBranch(nodes(leftId), nodes(rightId), b)))
    }

  val rootId #:: restIds = ids
  val (branches, leafs) = goDown(Seq((a, b, rootId)), Seq(), Map(), restIds)
  goUp(branches, leafs)(rootId)
}

// try it out

def split(xs: Seq[Int], b: Int) =
  if (xs.size > 1) {
    val (left, right) = xs.splitAt(xs.size / 2)
    Some((left, right, b + 1))
  } else {
    None
  }

val tree = mktree(0 to 1000, 0, split _, Stream.from(0))
println(tree)
like image 132
Thipor Kong Avatar answered Jan 03 '23 12:01

Thipor Kong