Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do path compression and union by rank complement each other?

I have been reading about union-find problem. The two main improvements are path compression and union by rank. As far as I understand union by rank is used to determine how to combine disjoint trees. If we have two disjoint trees T1 and T2, then we attach the root of the tree with smaller rank to the tree with higher rank. If we don't use path compression then rank is just the depth of a tree. This makes sense since we don't want to increase the depth of out tree,since it directly affects both finds and unions.

My problem is when we use path compression as well. I keep reading that the two optimizations complement each other, but I don't see that. Because of path compression, rank is no longer depth of the tree (it becomes an upper bound on depth). Suppose T1 has 2 branches (let rank of T1 be 3), and T2 has depth 2 and rank 2. Now suppose we perform find operation (with path compression) on the leaf of T1 marked with "*" below. Now if we union root of T1 and root of T2, then T2 would be attached to the root of T1(since rank is not updated by find). The resulting tree has depth 3. But we could have had a better performance if we attached T1 to T2.

T1:   o   (Rank = 3)    T2:   o  (Rank = 2)
     / \                      |
    o   o                     o
    |                         |
    o                         o
    |   
    *   

After a find on leaf of T1("*"), then union on roots of T1 and T2 we get

T1:    o       (Rank = 3)      T2: o  (Rank = 2)       
     /| |\                         |
    * o o o                        o
                                   |
                                   o
Result of T1 union T2
       o
   / | | |\
  * o o o  o   Rank = 3 and Max Depth = 3
           |
           o
           |
           o

Am i missing something here? How do path compression and union by rank complement each other? I know rank is just an upper bound on depth of a tree but I don't see how union by rank is improving the overall performance of the structure. How is this better than a union that combines the roots randomly?

Thanks for the help in advance.

like image 599
Hermon Avatar asked Jan 16 '17 23:01

Hermon


People also ask

What is path compression in Union find?

The idea of path compression is to make the found root as parent of x so that we don't have to traverse all intermediate nodes again. If x is root of a subtree, then path (to root) from all nodes under x also compresses.

Does path compression reduce the height of a tree?

Both trees have the different ranks – the resulting set's rank is the larger of the two. Ranks are used instead of height or depth because path compression will change the trees' heights over time.

Why do we need rank in Union find?

The idea of UNION BY RANK is to ensure that when we combine two trees, we try to keep the overall depth of the resulting tree small.

What heuristics are applied to the union and find operations to improve their running times?

Heuristics to improve the running time The first heuristic, union by rank, is similar to the weighted-union heuristic we used with the linked-list representation. The idea is to make the root of the tree with fewer nodes point to the root of the tree with more nodes.


1 Answers

Union by rank ensures that the maximum depth of the tree is log N, so it puts a worst case upper bound of O(log N) on every operation.

Path compression without any special union rules an upper bound of O(log N) on the amortized cost of each operation, but doesn't limit the worst case cost. (There might even be a tighter bound on the amortized cost, but O(log N) is the one I know how to prove)

Using both together, you get the O(log N) limit on the worst case and the amortized bound improves to O(𝛼(N)), which is effectively constant. In this way the two optimizations are complementary.

You are right that there are sequences of operations for which union by rank is not absolutely optimal, but the guarantees are better than what you get without it, and that is what counts. We don't usually spend effort optimizing the best case performance. We optimize the worst or average cases.

like image 160
Matt Timmermans Avatar answered Sep 25 '22 05:09

Matt Timmermans