So I'm teaching myself some graph algorithms, now on Kruskal's, and understand that it's recommended to use union-find so checking whether adding an edge creates a cycle only takes O(Log V) time. For practical purposes, I see why you'd want to, but strictly looking through Big O notation, does doing so actually affect the worst-case complexity?
My reasoning: If instead of union find, we did a DFS to check for cycles, the runtime for that would be O(E+V), and you have to perform that V times for a runtime of O(V^2 + VE). It's more than with union find, which would be O(V * LogV), but the bulk of the complexity of Kruskal's comes from deleting the minimum element of the priority queue E times, which is O(E * logE), the Big O answer. I don't really see a space advantage either since the union-find takes O(V) space and so too do the data structures you need to maintain to find a cycle using DFS.
So a probably overly long explanation for a simple question: Does using union-find in Kruskal's algorithm actually affect worst-case runtime?
So for Kruskal's algorithm, we initialize a union-find data structure over the vertices.
Runtime for Kruskal algorithm is O(E log E) and not O(E log V).
and understand that it's recommended to use union-find so checking whether adding an edge creates a cycle only takes O(Log V) time
This isn't right. Using union find is O(alpha(n) * m)
, where alpha(n)
is the inverse of the Ackermann function, and, for all intents and purposes, can be considered constant. So much faster than logarithmic:
Since
alpha(n)
is the inverse of this function,alpha(n)
is less than 5 for all remotely practical values ofn
. Thus, the amortized running time per operation is effectively a small constant.
but the bulk of the complexity of Kruskal's comes from deleting the minimum element of the priority queue E times
This is also wrong. Kruskal's algorithm does not involve using any priority queues. It involves sorting the edges by cost at the beginning. Although the complexity remains the one you mention for this step. However, sorting might be faster in practice than a priority queue (using a priority queue will, at best, be equivalent to a heap sort, which is not the fastest sorting algorithm).
Bottom line, if m
is the number of edges and n
the number of nodes.:
Sorting the edges: O(m log m)
.
For each edge, calling union-find
: O(m * alpha(n))
, or basically just O(m)
.
Total complexity: O(m log m + m * alpha(n))
.
If you don't use union-find, total complexity will be O(m log m + m * (n + m))
, if we use your O(n + m)
cycle finding algorithm. Although O(n + m)
for this step is probably an understatement, since you must also update your structure somehow (insert an edge). The naive disjoint-set algorithm is actually O(n log n)
, so even worse.
Note: in this case, you can write log n
instead of log m
if you prefer, because m = O(n^2)
and log(n^2) = 2log n
.
In conclusion: yes, union-find helps a lot.
Even if you use the O(log n)
variant of union-find, which would lead to O(m log m + m log n)
total complexity, which you could assimilate to O(m log m)
, in practice you'd rather keep the second part faster if you can. Since union-find is very easy to implement, there's really no reason not to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With