Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

path compression is enough for disjoint-set forests , why do we need union by rank

MAKE-SET(x)
    x.p = x
    x.rank = 0

UNION(x, y)
     LINK(FIND-SET(x),FIND-SET(y))

LINK(x, y)
    if x.rank > y.rank
        y.p = x
    else 
        x.p = y
        if x.rand == y.rank
            y.rank = y.rank +1

The FIND-SET procedure with path compression is quite simple:
FIND-SET(x)
    if x != x.p
        x.p = FIND-SET(x.p)
    return x.p

You can find the pseudocode in Introduction to Algorithms 3rd at chapter 21.

this is the pseudocode for disjoint-set forests with rank and path compression. From the pseudocode, we can see that each time before union operation, we will first find the set number of each node. In the FIND-SET operation with path compression, the height of x and y will always become only 2. Because x.p and y.p will both point to the set's root after FIND-SET. Why union by rank still needed ?


Shihab Shahriar have solved my problem, his answer is really impressive!

like image 922
Damon Hu Avatar asked Mar 30 '18 12:03

Damon Hu


Video Answer


1 Answers

Note that we apply path compression only when we perform find-set operation, and path compression can't be applied when we perform the union of two sets.

In union by rank, we take care of the fact that the root of the tree with less rank (or less depth/height) is made to point to the root of the tree with the high rank (or more depth/height). This makes sure that the tree representing the set never gets skewed.

An example where union by rank is performed:

depth=1,n=2 depth=0,n=1 depth=1,n=3 O U O = O / / \ O O O

If we didn't perform union by rank then the tree might become something like this:

depth=1,n=2 depth=0,n=1 depth=2,n=3 O U O = O / / O O / O i.e. it's height is increased.

You can do an amortized analysis and calculate the time complexity of find-set when union by rank is applied then you will find that time is never going to exceed O(log2(n)).

So, if you don't perform union by rank then the find-set operation will take O(d) time (d represents the depth of tree) and in worst case d can become n(number of elements in the set). So for find-set operation, the time complexity would become O(n) for the first time. However, for the next-subsequent find-set operation the time may get decreased, but the point is we don't want O(n) time for any find-set operation. Thus, for the case when there are several union operations and at the end, there is a find-set operation then O(n) time would be consumed if you don't use union by rank.

We use both the union by rank and path compression together because we want to decrease the height of the tree and make it less than log2(n) (n is the number of elements in the disjoint-set), the ultimate goal is to make the height of the tree to almost one.

like image 192
Amit Upadhyay Avatar answered Nov 15 '22 12:11

Amit Upadhyay