Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disjoint-set forests - why should the rank be increased by one when the find of two nodes are of same rank?

I am implementing the disjoint-set datastructure to do union find. I came across the following statement in Wikipedia:

... whenever two trees of the same rank r are united, the rank of the result is r+1.

Why should the rank of the joined tree be increased by only one when the trees are of the same rank? What happens if I simply add the two ranks (i.e. 2*r)?

like image 475
jeffreyveon Avatar asked Dec 01 '22 03:12

jeffreyveon


1 Answers

First, what is rank? It is almost the same as the height of a tree. In fact, for now, pretend that it is the same as the height.

We want to keep trees short, so keeping track of the height of every tree helps us do that. When unioning two trees of different height, we make the root of the shorter tree a child of the root of the taller tree. Importantly, this does not change the height of the taller tree. That is, the rank of the taller tree does not change.

However, when unioning two trees of the same height, we make one root the child of the other, and this increases the height of that overall tree by one, so we increase the rank of that root by one.

Now, I said that rank was almost the same as the height of the tree. Why almost? Because of path compression, a second technique used by the union-find data structure to keep trees short. Path compression can alter an existing tree to make it shorter than indicated by its rank. In principle, it might be better to make decisions based on the actual height than using rank as a proxy for height, but in practice, it is too hard/too slow to keep track of the true height information, whereas it is very easy/fast to keep track of rank.

You also asked "What happens if I simply add the two ranks (i.e. 2*r)?" This is an interesting question. The answer is probably nothing, meaning everything will still work just fine, with the same efficiency as before. (Well, assuming that you use 1 as your starting rank rather than 0.) Why? Because the way rank is used, what matters is the relative ordering of ranks, not their absolute magnitudes. If you add them, then your ranks will be 1,2,4,8 instead of 1,2,3,4 (or more likely 0,1,2,3), but they will still have exactly the same relative ordering so all is well. Your rank is simply 2^(the old rank). The biggest danger is that you run a larger risk of overflowing the integer used to represent the rank when dealing with very large sets (or, put another way, that you will need to use more space to store your ranks).

On the other hand, notice that by adding the two ranks, you are approximating the size of the trees rather than the heights of the trees. By always adding the two ranks, whether they are equal or not, then you are exactly tracking the sizes of the trees. Again, everything works just fine, with the same caveats about the possibility of overflowing integers if your trees are very large.

In fact, union-by-size is widely recognized as a legitimate alternative to union-by-rank. For some applications, you actually want to know the sizes of the sets, and for those applications union-by-size is actually preferabe to union-by-rank.

like image 168
Chris Okasaki Avatar answered Apr 28 '23 04:04

Chris Okasaki