For many problems I see the solution recommended is to use a union-find data structure. I tried to read about it and think about how it is implemented (using C++). My current understanding is that it is nothing but a list of sets. So to find which set an element belongs we require n*log n
operations. And when we have to perform union, then we have to find the two sets which needs to be merged and do a set_union
on them. This doesn't look terribly efficient to me. Is my understanding of this data structure correct or am I missing something?
In the Kruskal's Algorithm, Union Find Data Structure is used as a subroutine to find the cycles in the graph, which helps in finding the minimum spanning tree.
A union-find algorithm is an algorithm that performs two useful operations on such a data structure: Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset. Union: Join two subsets into a single subset.
In this lecture we describe the union-find problem. This is a problem that captures a key task one needs to solve in order to efficiently implement Kruskal's minimum-spanning-tree algorithm. We then give two data structures for it with good amortized running time.
Algorithms and Data Structures: Union-find is an important data struc- ture beyond graphs as it allows to work efficiently with equivalence classes whenever we can easily designate an element as a canonical representative of the class. Programming: We leave it to the reader to study the code that implements union-find.
This is quite late reply, but this has probably not been answered elsewhere on stackoverflow, and since this is top most page for someone searching for union-find, here is the detailed solution.
Find-Union is a very fast operation, performing in near constant time. It follows Jeremie's insights of path compression, and tracking set sizes. Path compression is performed on each find operation itself, thereby taking amortized lg*(n) time. lg* is like the inverse Ackerman function, growing so very slow that it is rarely beyond 5 (at least till n< 2^65535). Union/Merge sets is performed lazy, by just pointing 1 root to another, specifically smaller set's root to larger set's root, which is completed in constant time.
Refer the below code from https://github.com/kartikkukreja/blog-codes/blob/master/src/Union%20Find%20%28Disjoint%20Set%29%20Data%20Structure.cpp
class UF {
int *id, cnt, *sz;
public:
// Create an empty union find data structure with N isolated sets.
UF(int N) {
cnt = N; id = new int[N]; sz = new int[N];
for (int i = 0; i<N; i++) id[i] = i, sz[i] = 1; }
~UF() { delete[] id; delete[] sz; }
// Return the id of component corresponding to object p.
int find(int p) {
int root = p;
while (root != id[root]) root = id[root];
while (p != root) { int newp = id[p]; id[p] = root; p = newp; }
return root;
}
// Replace sets containing x and y with their union.
void merge(int x, int y) {
int i = find(x); int j = find(y); if (i == j) return;
// make smaller root point to larger one
if (sz[i] < sz[j]) { id[i] = j, sz[j] += sz[i]; }
else { id[j] = i, sz[i] += sz[j]; }
cnt--;
}
// Are objects x and y in the same set?
bool connected(int x, int y) { return find(x) == find(y); }
// Return the number of disjoint sets.
int count() { return cnt; }
};
The data structure can be represented as a tree, with branches reversed (instead of pointing down, the branches point upwards to the parent---and link a child with its parent).
If I remember correctly, it can be shown (easily):
that path compression (whenever you do a lookup for the "parent" of a set A, you "compress" the path so that each future call to these will provide the parent in time O(1)) will lead to O(log n) complexity per call;
that balancing (you keep approximately track of the number of children each set has, and when you have to "unite" two sets, you make the one with the fewer children child of the one with the most) also leads to a O(log n) complexity per call.
A more involved proof can show that when you combine both optimizations, you obtain an average complexity that is the inverse Ackermann function, written α(n), and this was Tarjan's main invention for this structure.
It was later shown, I believe, that for some specific usage patterns, this complexity is actually constant (though for all practical purpose inverse of ackermann is about 4). According to the Wikipedia page on Union-Find, in 1989, the amortized cost per operation of any equivalent data structure was shown to be Ω(α(n)), proving that the current implementation is asymptotically optimal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With