I'm looking at the implementation of UnionFind with union by rank and path compression from here http://en.wikipedia.org/wiki/Disjoint-set_data_structure#Disjoint-set_forests (it's pretty much the same pseudo-code as in CLRS) and don't understand why path compression doesn't change rank. If we call find
for an endpoint of the longest path from the root the rank should go down and if it doesn't the next union
operation will choose an incorrect root.
Union by rank ensures that the maximum depth of the tree is log N, so it puts a worst case upper bound of O (log N) on every operation. Path compression without any special union rules an upper bound of O (log N) on the amortized cost of each operation, but doesn't limit the worst case cost.
Because of path compression, rank is no longer depth of the tree (it becomes an upper bound on depth). Suppose T1 has 2 branches (let rank of T1 be 3), and T2 has depth 2 and rank 2. Now suppose we perform find operation (with path compression) on the leaf of T1 marked with "*" below.
To implement union by rank, each element is associated with a rank. Initially a set has one element and a rank of zero. Both trees have the different ranks – the resulting set’s rank is the larger of the two. Ranks are used instead of height or depth because path compression will change the trees’ heights over time.
We can optimize it by using union by rank. Union by rank always attaches the shorter tree to the root of the taller tree. To implement union by rank, each element is associated with a rank. Initially a set has one element and a rank of zero. Both trees have the same rank – the resulting set’s rank is one larger
"Rank" is one of those horribly overloaded terms in theoretical computer science. As Wikipedia notes, in the context of that disjoint set data structure with path compression, rank is not an intrinsic property of the current topology of the forest -- there's just no good way to keep the height of each node up to date. As defined by the sequence of unions, however, rank is useful in proving the running time bound involving the inverse Ackermann function.
Rank is not the actual depth of the tree rather it is an upper bound. As such, on a find
operation, the rank is allowed to get out of sync with the depth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With