Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding purely functional persistent binary trees

I'm scanning this code, and I want to understand how it works.

There are two possible trees: a tree for an empty set, and a tree consisting of an integer and two subtrees. The invariant: For each node, the node on the right hand side contains integer values higher than the parent, while the left side node contains integer values lower than the parent.

Here is the code:

abstract class IntSet{
  def incl(x: Int): IntSet         // include element x in the IntSet
  def contains(x: Int): Boolean    // is x an element of the set?
  def union(other: IntSet): IntSet // union of this and other set
}

object Empty extends IntSet{
  def contains(x: Int): Boolean = false
  def incl(x:Int): IntSet = new NonEmpty(x, Empty, Empty)
  override def toString = "."
  def union(other:IntSet): IntSet = other
}

class NonEmpty(elem: Int, left: IntSet, right: IntSet) extends IntSet{

  def contains(x: Int): Boolean =
    if      (x < elem) left contains x
    else if (x > elem) right contains x
    else true

  def incl(x: Int): IntSet =
    if      (x < elem) new NonEmpty(elem, left incl x, right)
    else if (x > elem) new NonEmpty(elem, left, right incl x)
    else this

  override def toString = "{" + left + elem + right + "}"

  def union(other:IntSet): IntSet =
    ((left union right) union other) incl elem

}

How is this data structure used? How does it achieves persistence? How does it work?

like image 779
runw Avatar asked Oct 08 '13 03:10

runw


2 Answers

The code maps directly to the description you've provided.

Let's take a simple example to demonstrate usage: You have first an empty set, say e then you add to it an element to get another set, say s1. You will then have 2 sets, e and s1:

val e = Empty
e contains 42        // will be false
// create s1 from e
val s1 = e incl 1    // e does not change; it remains a reference to the Empty set.
// Now we have s1, a set that should contain (1) and nothing else.
s1 contains 1        // will be true
s1 contains 42       // will be false

I'm guessing you are familiar with the Scala shorthand that lets you type s1 incl 1 instead of s1.incl(1)

Note that there can only ever be one empty set, so this is just as good:

val s1 = Empty incl 1

Then let's say you wanted to add, say 2 to get another set s2 whose elements must include both {1, 2}.

val s2 = s1 incl 2
s2 contains 1       // true
s2 contains 2       // true
s2 contains 3       // false

So the method incl on any set takes an element and returns a new set - it does not change the set (the original object ob which the include method was called).

We have two types of tree-sets; the empty and the non-empty and each have an implementation for incl:

// Empty
def incl(x:Int): IntSet = new NonEmpty(x, Empty, Empty)

Reads: "adding an element to an empty (tree) set yields another set which is a non-empty tree with only a root node with value 1 and empty left and right subtrees."

The non-empty sets have a constructor argument elem which represents the root of the tree and is visible to all methods in NonEmpty.

// Non-Empty
def incl(x: Int): IntSet =
   if      (x < elem) new NonEmpty(elem, left incl x, right)
   else if (x > elem) new NonEmpty(elem, left, right incl x)
   else this

Reads: (in reverse order of the above if-else):

  • adding an element x to a non empty set whose root element is also x gives you the same set (this)
  • adding an element x to a non empty set whose root element is less than x gives you another set, where:
    • the root element is the same as the original set
    • the left subtree is unchanged - same as in the original set
    • the new right subtree becomes the original right subtree tree with x added to it": right incl x
  • adding an element x to a non empty set whose root element is greater than x gives you another set, where:
    • the root element is the same as the original set
    • the right subtree is unchanged - same as in the original set
    • the new left subtree becomes the original left subtree tree with x added to it": left incl x

The 'persistence' is achieved by the fact that none of the trees or subtrees are ever changed. In the example

val s1 = Empty incl 1      // s1 is a tree with only a root(1) an no branches.
val s2 = s1 incl 2         // s2 is another tree with - 
                           //  - the same root(1),
                           //  - the same left-subtree as s1, (happens to be Empty)
                           //  - a new subtree which in turn is a tree with - 
                           //    - the root element (2)
                           //    - no left or right brances.
s1 contains 1 // true
s1 contains 2 // false
s2 contains 1 // true
s2 contains 2 // true

val s3 = s2 incl -3        // s2.incl(-3)
                           // from s2 we get s3, which does not change s2's structure
                           // in any way.
                           // s3 is the new set returned by incl, whose
                           //  - root element remains (1)
                           //  - left subtree is a new tree that contains 
                           //    just (-3) and has empty left, right subtrees
                           //  - right subtree is the same as s2's right subtree!

s3.contains(-3) // true; -3 is contained by s3's left subtree
s3.contains(1)  // true; 1 is s3's root.
s3.contains(2)  // true; 2 is contained by s3's right subtree
s3.contains(5)  // false

We use only incl to derive sets (trees) from other sets, without changing the original set. This is because at very stage, we either -

  1. return new data-structures based on exiting ones instead of modifying existing structures, OR,
  2. return existing structures as they are.

contains works the same way: Empty has an implementation that returns false for any input. NonEmpty quickly returns true if the given element is the same as that as it's root, or if either it's left or right subtrees contain it!

like image 148
Faiz Avatar answered Nov 15 '22 03:11

Faiz


Let's start with incl. This is a method on a tree which takes an element and creates a new tree equal to the current tree but with the element added to it. It does this without modifying the original tree. This is all part of dealing with these trees as immutable data structures, and is at the heart of the idea of "persistent" data structures. Essentially, for any changes we wish to make to a tree, we want a new tree created and the previous state of the tree preserved. We also want the new tree to be created efficiently, creating it out of as few new nodes as possible, and linking in to existing nodes whereever this won't affect the original.

Take as an example the tree below:

    4
   / \
  2   6
 / \   \
1   3   7

If we want to add the element 5 to this, we want to end up with:

     4
   /   \
  2     6
 / \   / \
1  3   5  7

We can do this by creating a new root node containing the element 4, which points to the existing node (and attached subtree) containing 2, and a new node containing 6, which in turn (note the recursive nature of the call to new NonEmpty(elem, left, right **incl** x)) points to a new node containing 5 and the existing node containing 7. So only three nodes were created, and four existing nodes were reused. Note that this does not affect the original tree which can continue to refer to the leaf nodes containing 1, 2, 3 and 7.

like image 39
Shadowlands Avatar answered Nov 15 '22 04:11

Shadowlands