Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Red Black Trees: Kahrs version

I'm currently trying to understand the Red-Black tree implementation as given by Okasaki and delete methods by Kahrs (the untyped version).

In the delete implementation a function app is used, which seems to "merging" the children of the node being deleted. And again, the algorithm seems to use the same "break" the Red-Red property rather than black height (please correct me if I'm wrong).. We are always creating Red Nodes (even if we break the Red-Red property). walk down the subtree rooted at the node being deleted, correct the red-red violations, once we reach the leafs, we start going up the path (starting at the "new tree" merge created) fixing the red-red violation up the path.

app :: RB a -> RB a -> RB a
app E x = x
app x E = x
app (T R a x b) (T R c y d) =
    case app b c of
        T R b' z c' -> T R(T R a x b') z (T R c' y d)
        bc -> T R a x (T R bc y d)
app (T B a x b) (T B c y d) = 
    case app b c of
        T R b' z c' -> T R(T B a x b') z (T B c' y d)
        bc -> balleft a x (T B bc y d)
app a (T R b x c) = T R (app a b) x c
app (T R a x b) c = T R a x (app b c)
  1. I'm not able to see how we are "not creating" / "fixing" the black height violation? deleting a black node would create bh-1 and bh subtrees at some node up the path.
  2. Results from this paper, looks like this implementation is really fast and that the "merge" method might be the key to answering the increase in speed.

any pointers to an explanation for this "merge" operation would be great.

please note this is not a homework problem or anything else. I'm studying independently the implementations given in Okasaki and fill in the "messy" deletes too.

Thanks.

like image 448
user3169543 Avatar asked Aug 01 '16 14:08

user3169543


1 Answers

Given that there is a lot that can be said on this topic, I'll try to stick as closely as possible to your questions, but let me know if I missed something important.

What the hell is app doing?

You are correct in that app breaks the red invariant rather than the black one on the way down and fixes this on the way back up.

It appends or merges two subtrees that obey the order property, black invariant, red invariant, and have the same black-depth into a new tree that also obeys the order property, black invariant, and red invariant. The one notable exception is that the root or app rb1 rb2 sometimes is red and has two red subtrees. Such trees are said to be "infrared". This is dealt with in delete by just setting the root to be black in this line.

 case del t of {T _ a y b -> T B a y b; _ -> E}

Claim 1 (Order property) if the inputs rb1 and rb2 obey the order property individually (left subtree < node value < right subtree) and the max value in rb1 is less than the min value in rb2, then app rb1 rb2 also obeys the order property.

This one is easy to prove. In fact, you can even sort of see it when looking at the code - the as always stays to the left of the bs or b's, which always stay to the left of the cs or c's, which always stay to the left of the ds. And the b's and c's also obey this property since they are the result of recursive calls to app with subtrees b and c satisfying the claim.

Claim 2 (Red invariant) if the inputs rb1 and rb2 obey the red invariant (if a node is red, then both its children are black), then so do all the nodes in app rb1 rb2, except for possibly the root. However, the root can be "infrared" only when one of its arguments has a red root.

Proof The proof is by branching on the patterns.

  • For cases app E x = x and app x E = x the claim is trivial
  • For app (T R a x b) (T R c y d), the claim hypothesis tells us all of a, b, c, and d are black. It follows that app b c obeys the red invariant fully (it is not infrared).
    • If app b c matches T R b' z c' then its subtrees b' and c' must be black (and obey the red invariant), so T R (T R a x b') z (T R c' y d) obeys the red-invariant with an infrared root.
    • Otherwise, app b c produced a black node bc, so T R a x (T R bc y d) obeys the red invariant.
  • For app (T B a x b) (T B c y d) we only care that app b c will itself obey the red-invariant

    • If app b c is a red node, it can be infrared but its subtrees b' and c', on the other hand, must obey the red invariant completely. That means T R (T B a x b') z (T B c' y d) also obeys the red invariant.
    • Now, if bc turns out to be black, we call balleft a x (T B bc y d). The neat thing here is that we already know which two cases of balleft can be triggered: depending on whether a is red or black

      balleft (T R a x b) y c = T R (T B a x b) y c
      balleft bl x (T B a y b) = balance bl x (T R a y b)
      
      • In the first case, what ends up happening is that we swap the color of the left subtree from red to black (and doing so never breaks the red-invariant) and put everything in a red subtree. Then balleft a x (T B bc y d) actually looks like T R (T B .. ..) (T B bc y d), and that obeys the red invariant.

      • Otherwise, the call to balleft turns into balance a x (T R bc y d) and the whole point of balance is that it fixes root level red violations.

  • For app a (T R b x c) we know that a and b must be black, so app a b is not infrared, so T R (app a b) x c obeys the red invariant. The same holds for app a (T R b x c) but with letters a, b, and c permuted.

Claim 3 (Black invariant) if the inputs rb1 and rb2 obey the black invariant (any path from the root to the leaves has the same number of black nodes on the way) and have the same black-depth, app rb1 rb2 also obeys the black invariant and has the same black-depth.

Proof The proof is by branching on the patterns.

  • For cases app E x = x and app x E = x the claim is trivial
  • For app (T R a x b) (T R c y d) we know that since T R a x b and T R c y d have the same black depth, so do their subtrees a, b, c, and d. By the claim (remember, this is induction!) app b c will also obey the black invariant and have the same black depth. Now, we branch our proof on case app b c of ...
    • If app b c matches T R b' z c' it is red and its subtrees b' and c' will have the same black depth as app b c (itself), which in turn has the same black depth as a and d, so T R (T R a x b') z (T R c' y d) obeys the black invariant and has the same black depth as its inputs.
    • Otherwise, app b c produced a black node bc, but again that node has the same black depth as a and d, so T R a x (T R bc y d) as a whole still obeys the black invariant and has the same black depth as its inputs.
  • For app (T B a x b) (T B c y d) we again know immediately that subtrees a, b, c, and d all have the same black depth as app b c. As before, we branch our proof on case app b c of ...
    • If app b c is a red node of the form T R b' z c', we again get that b', c', a, and d have the same black-depth, so T R (T B a x b') z (T B c' y d) also has this same black depth
    • Now, if bc turns out to be black, we apply the same reasoning as in our previous claim to figure out that the output balleft a x (T B bc y d) actually has the form either
      • T R (T B .. ..) (T B bc y d) where (T B .. ..) is just a recolored as black so the overall tree will satisfy the black invariant and will have black-depth one more than any of a, b, c, or d, which is to say the same black-depth as the inputs T B a x b and T B c y d).
    • balance a x (T R bc y d) and balance maintains the black invariant.
  • For app a (T R b x c) or app (T R a x b) c, we know that a, b, and c all have the same black-depth and consequently, which means app a b and app b c have this same black-depth, which means T R (app a b) x c and T R (app a b) x c also have this same black-depth

Why is it fast?

My Racket is a bit rusty so I don't have a great answer to this. In general, app makes delete fast by allowing you to do everything in two steps: you go down to the target site, then you continue downwards to merge the subtrees, then you come back up fixing the invariants as you go, all the way to the root.

In the paper you reference, once you get down to the target site, you call min/delete (I think this is the key difference - the rotations otherwise look pretty similar) which will recursively call itself to find the element in the subtree that you can plop into the target site and the state of the subtree after that element has been taken out. I believe that last part is what hurts the performance of that implementation.

like image 182
Alec Avatar answered Nov 04 '22 04:11

Alec