I'm currently trying to understand the Red-Black tree implementation as given by Okasaki and delete methods by Kahrs (the untyped version).
In the delete implementation a function app
is used, which seems to "merging" the children of the node being deleted. And again, the algorithm seems to use the same "break" the Red-Red property rather than black height (please correct me if I'm wrong).. We are always creating Red Nodes (even if we break the Red-Red property). walk down the subtree rooted at the node being deleted, correct the red-red violations, once we reach the leafs, we start going up the path (starting at the "new tree" merge created) fixing the red-red violation up the path.
app :: RB a -> RB a -> RB a
app E x = x
app x E = x
app (T R a x b) (T R c y d) =
case app b c of
T R b' z c' -> T R(T R a x b') z (T R c' y d)
bc -> T R a x (T R bc y d)
app (T B a x b) (T B c y d) =
case app b c of
T R b' z c' -> T R(T B a x b') z (T B c' y d)
bc -> balleft a x (T B bc y d)
app a (T R b x c) = T R (app a b) x c
app (T R a x b) c = T R a x (app b c)
bh-1
and bh
subtrees at some node up the path. any pointers to an explanation for this "merge" operation would be great.
please note this is not a homework problem or anything else. I'm studying independently the implementations given in Okasaki and fill in the "messy" deletes too.
Thanks.
Given that there is a lot that can be said on this topic, I'll try to stick as closely as possible to your questions, but let me know if I missed something important.
app
doing?You are correct in that app
breaks the red invariant rather than the black one on the way down and fixes this on the way back up.
It appends or merges two subtrees that obey the order property, black invariant, red invariant, and have the same black-depth into a new tree that also obeys the order property, black invariant, and red invariant. The one notable exception is that the root or app rb1 rb2
sometimes is red and has two red subtrees. Such trees are said to be "infrared". This is dealt with in delete
by just setting the root to be black in this line.
case del t of {T _ a y b -> T B a y b; _ -> E}
Claim 1 (Order property) if the inputs rb1
and rb2
obey the order property individually (left subtree < node value < right subtree) and the max value in rb1
is less than the min value in rb2
, then app rb1 rb2
also obeys the order property.
This one is easy to prove. In fact, you can even sort of see it when looking at the code - the a
s always stays to the left of the b
s or b'
s, which always stay to the left of the c
s or c'
s, which always stay to the left of the d
s. And the b'
s and c'
s also obey this property since they are the result of recursive calls to app
with subtrees b
and c
satisfying the claim.
Claim 2 (Red invariant) if the inputs rb1
and rb2
obey the red invariant (if a node is red, then both its children are black), then so do all the nodes in app rb1 rb2
, except for possibly the root. However, the root can be "infrared" only when one of its arguments has a red root.
Proof The proof is by branching on the patterns.
app E x = x
and app x E = x
the claim is trivialapp (T R a x b) (T R c y d)
, the claim hypothesis tells us all of a
, b
, c
, and d
are black. It follows that app b c
obeys the red invariant fully (it is not infrared).
app b c
matches T R b' z c'
then its subtrees b'
and c'
must be black (and obey the red invariant), so T R (T R a x b') z (T R c' y d)
obeys the red-invariant with an infrared root.app b c
produced a black node bc
, so T R a x (T R bc y d)
obeys the red invariant.For app (T B a x b) (T B c y d)
we only care that app b c
will itself obey the red-invariant
app b c
is a red node, it can be infrared but its subtrees b'
and c'
, on the other hand, must obey the red invariant completely. That means T R (T B a x b') z (T B c' y d)
also obeys the red invariant.Now, if bc
turns out to be black, we call balleft a x (T B bc y d)
. The neat thing here is that we already know which two cases of balleft
can be triggered: depending on whether a
is red or black
balleft (T R a x b) y c = T R (T B a x b) y c
balleft bl x (T B a y b) = balance bl x (T R a y b)
In the first case, what ends up happening is that we swap the color of the left subtree from red to black (and doing so never breaks the red-invariant) and put everything in a red subtree. Then balleft a x (T B bc y d)
actually looks like T R (T B .. ..) (T B bc y d)
, and that obeys the red invariant.
Otherwise, the call to balleft
turns into balance a x (T R bc y d)
and the whole point of balance
is that it fixes root level red violations.
For app a (T R b x c)
we know that a
and b
must be black, so app a b
is not infrared, so T R (app a b) x c
obeys the red invariant. The same holds for app a (T R b x c)
but with letters a
, b
, and c
permuted.
Claim 3 (Black invariant) if the inputs rb1
and rb2
obey the black invariant (any path from the root to the leaves has the same number of black nodes on the way) and have the same black-depth, app rb1 rb2
also obeys the black invariant and has the same black-depth.
Proof The proof is by branching on the patterns.
app E x = x
and app x E = x
the claim is trivialapp (T R a x b) (T R c y d)
we know that since T R a x b
and T R c y d
have the same black depth, so do their subtrees a
, b
, c
, and d
. By the claim (remember, this is induction!) app b c
will also obey the black invariant and have the same black depth. Now, we branch our proof on case app b c of ...
app b c
matches T R b' z c'
it is red and its subtrees b'
and c'
will have the same black depth as app b c
(itself), which in turn has the same black depth as a
and d
, so T R (T R a x b') z (T R c' y d)
obeys the black invariant and has the same black depth as its inputs.app b c
produced a black node bc
, but again that node has the same black depth as a
and d
, so T R a x (T R bc y d)
as a whole still obeys the black invariant and has the same black depth as its inputs.app (T B a x b) (T B c y d)
we again know immediately that subtrees a
, b
, c
, and d
all have the same black depth as app b c
. As before, we branch our proof on case app b c of ...
app b c
is a red node of the form T R b' z c'
, we again get that b'
, c'
, a
, and d
have the same black-depth, so T R (T B a x b') z (T B c' y d)
also has this same black depthbc
turns out to be black, we apply the same reasoning as in our previous claim to figure out that the output balleft a x (T B bc y d)
actually has the form either
T R (T B .. ..) (T B bc y d)
where (T B .. ..)
is just a
recolored as black so the overall tree will satisfy the black invariant and will have black-depth one more than any of a
, b
, c
, or d
, which is to say the same black-depth as the inputs T B a x b
and T B c y d)
.balance a x (T R bc y d)
and balance
maintains the black invariant.app a (T R b x c)
or app (T R a x b) c
, we know that a
, b
, and c
all have the same black-depth and consequently, which means app a b
and app b c
have this same black-depth, which means T R (app a b) x c
and T R (app a b) x c
also have this same black-depthMy Racket is a bit rusty so I don't have a great answer to this. In general, app
makes delete fast by allowing you to do everything in two steps: you go down to the target site, then you continue downwards to merge the subtrees, then you come back up fixing the invariants as you go, all the way to the root.
In the paper you reference, once you get down to the target site, you call min/delete
(I think this is the key difference - the rotations otherwise look pretty similar) which will recursively call itself to find the element in the subtree that you can plop into the target site and the state of the subtree after that element has been taken out. I believe that last part is what hurts the performance of that implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With