For many problems I see the solution recommended is to use a union-find data structure. I tried to read about it and think about how it is implemented (using C++). My current understanding is that it is nothing but a list of sets. So to find which set an element belongs we require <code>n*log n</code> operations. And when we have to perform union, then we have to find the two sets which needs to be merged and do a <code>set_union</code> on them. This doesn't look terribly efficient to me. Is my understanding of this data structure correct or am I missing something?

The data structure can be represented as a tree, with branches reversed (instead of pointing down, the branches point upwards to the parent---and link a child with its parent). If I remember correctly, it can be shown (easily): <ul> <li>that path compression (whenever you do a lookup for the "parent" of a set A, you "compress" the path so that each future call to these will provide the parent in time O(1)) will lead to O(log n) complexity per call;</li> <li>that balancing (you keep approximately track of the number of children each set has, and when you have to "unite" two sets, you make the one with the fewer children child of the one with the most) also leads to a O(log n) complexity per call.</li> </ul> A more involved proof can show that when you combine both optimizations, you obtain an average complexity that is the inverse Ackermann function, written α(n), and this was Tarjan's main invention for this structure. It was later shown, I believe, that for some specific usage patterns, this complexity is actually constant (though for all practical purpose inverse of ackermann is about 4). According to the Wikipedia page on Union-Find, in 1989, the amortized cost per operation of any equivalent data structure was shown to be Ω(α(n)), proving that the current implementation is asymptotically optimal.

Union-find data structure

Tags:

c++

data-structures

union-find

For many problems I see the solution recommended is to use a union-find data structure. I tried to read about it and think about how it is implemented (using C++). My current understanding is that it is nothing but a list of sets. So to find which set an element belongs we require n*log n operations. And when we have to perform union, then we have to find the two sets which needs to be merged and do a set_union on them. This doesn't look terribly efficient to me. Is my understanding of this data structure correct or am I missing something?

385

asked Nov 28 '11 17:11

Asha

2 Answers

This is quite late reply, but this has probably not been answered elsewhere on stackoverflow, and since this is top most page for someone searching for union-find, here is the detailed solution.

Find-Union is a very fast operation, performing in near constant time. It follows Jeremie's insights of path compression, and tracking set sizes. Path compression is performed on each find operation itself, thereby taking amortized lg*(n) time. lg* is like the inverse Ackerman function, growing so very slow that it is rarely beyond 5 (at least till n< 2^65535). Union/Merge sets is performed lazy, by just pointing 1 root to another, specifically smaller set's root to larger set's root, which is completed in constant time.

Refer the below code from https://github.com/kartikkukreja/blog-codes/blob/master/src/Union%20Find%20%28Disjoint%20Set%29%20Data%20Structure.cpp

class UF {
  int *id, cnt, *sz;
  public:
// Create an empty union find data structure with N isolated sets.
UF(int N) {
    cnt = N; id = new int[N]; sz = new int[N];
    for (int i = 0; i<N; i++)  id[i] = i, sz[i] = 1; }
~UF() { delete[] id; delete[] sz; }

// Return the id of component corresponding to object p.
int find(int p) {
    int root = p;
    while (root != id[root])    root = id[root];
    while (p != root) { int newp = id[p]; id[p] = root; p = newp; }
    return root;
}
// Replace sets containing x and y with their union.
void merge(int x, int y) {
    int i = find(x); int j = find(y); if (i == j) return;
    // make smaller root point to larger one
    if (sz[i] < sz[j]) { id[i] = j, sz[j] += sz[i]; }
    else { id[j] = i, sz[i] += sz[j]; }
    cnt--;
}
// Are objects x and y in the same set?
bool connected(int x, int y) { return find(x) == find(y); }
// Return the number of disjoint sets.
int count() { return cnt; }
};

104

answered Oct 03 '22 03:10

Atif Hussain

The data structure can be represented as a tree, with branches reversed (instead of pointing down, the branches point upwards to the parent---and link a child with its parent).

If I remember correctly, it can be shown (easily):

that path compression (whenever you do a lookup for the "parent" of a set A, you "compress" the path so that each future call to these will provide the parent in time O(1)) will lead to O(log n) complexity per call;
that balancing (you keep approximately track of the number of children each set has, and when you have to "unite" two sets, you make the one with the fewer children child of the one with the most) also leads to a O(log n) complexity per call.

A more involved proof can show that when you combine both optimizations, you obtain an average complexity that is the inverse Ackermann function, written α(n), and this was Tarjan's main invention for this structure.

It was later shown, I believe, that for some specific usage patterns, this complexity is actually constant (though for all practical purpose inverse of ackermann is about 4). According to the Wikipedia page on Union-Find, in 1989, the amortized cost per operation of any equivalent data structure was shown to be Ω(α(n)), proving that the current implementation is asymptotically optimal.

answered Oct 03 '22 03:10

Jérémie

Related questions
                            
                                How do you handle command line options and config files?
                            
                                C++ STL data structure alignment, algorithm vectorization
                            
                                Suggestion for chkstk.asm stackoverflow exception in C++ with Visual Studio
                            
                                Purpose of Header guards
                            
                                STL Priority Queue - deleting an item
                            
                                returning std::string/std::list from dll
                            
                                C++ backend with C# frontend?
                            
                                Is a DLL slower than a static link?
                            
                                How to bind a constructor in C++?
                            
                                Type Conversion/Casting Confusion in C++
                            
                                How do the operators < and > work with pointers?
                            
                                Using GMock to verify a Destructor Call
                            
                                How to calculate a SHA-512 hash in C++ on Linux?
                            
                                Why can't everything be overloaded in C++?
                            
                                Why can't I instantiate operator<<(ostream&, vector<T>&) with T=vector<int>?
                            
                                Code::Blocks - how to compile multiple source files
                            
                                Is this a valid (ab)use of lambda expressions?
                            
                                How to declare data members that are objects of any type in a class
                            
                                sizeof *this object
                            
                                Accessing memory used by other program

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With