Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does using union-find in Kruskal's algorithm actually affect worst-case runtime?

So I'm teaching myself some graph algorithms, now on Kruskal's, and understand that it's recommended to use union-find so checking whether adding an edge creates a cycle only takes O(Log V) time. For practical purposes, I see why you'd want to, but strictly looking through Big O notation, does doing so actually affect the worst-case complexity?

My reasoning: If instead of union find, we did a DFS to check for cycles, the runtime for that would be O(E+V), and you have to perform that V times for a runtime of O(V^2 + VE). It's more than with union find, which would be O(V * LogV), but the bulk of the complexity of Kruskal's comes from deleting the minimum element of the priority queue E times, which is O(E * logE), the Big O answer. I don't really see a space advantage either since the union-find takes O(V) space and so too do the data structures you need to maintain to find a cycle using DFS.

So a probably overly long explanation for a simple question: Does using union-find in Kruskal's algorithm actually affect worst-case runtime?

like image 900
Chirayu Poudel Avatar asked Aug 16 '15 22:08

Chirayu Poudel


People also ask

Does Kruskal's use union find?

So for Kruskal's algorithm, we initialize a union-find data structure over the vertices.

What is the runtime of Kruskal's algorithm?

Runtime for Kruskal algorithm is O(E log E) and not O(E log V).


1 Answers

and understand that it's recommended to use union-find so checking whether adding an edge creates a cycle only takes O(Log V) time

This isn't right. Using union find is O(alpha(n) * m), where alpha(n) is the inverse of the Ackermann function, and, for all intents and purposes, can be considered constant. So much faster than logarithmic:

Since alpha(n) is the inverse of this function, alpha(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.


but the bulk of the complexity of Kruskal's comes from deleting the minimum element of the priority queue E times

This is also wrong. Kruskal's algorithm does not involve using any priority queues. It involves sorting the edges by cost at the beginning. Although the complexity remains the one you mention for this step. However, sorting might be faster in practice than a priority queue (using a priority queue will, at best, be equivalent to a heap sort, which is not the fastest sorting algorithm).

Bottom line, if m is the number of edges and n the number of nodes.:

  1. Sorting the edges: O(m log m).

  2. For each edge, calling union-find: O(m * alpha(n)), or basically just O(m).

  3. Total complexity: O(m log m + m * alpha(n)).

  4. If you don't use union-find, total complexity will be O(m log m + m * (n + m)), if we use your O(n + m) cycle finding algorithm. Although O(n + m) for this step is probably an understatement, since you must also update your structure somehow (insert an edge). The naive disjoint-set algorithm is actually O(n log n), so even worse.

Note: in this case, you can write log n instead of log m if you prefer, because m = O(n^2) and log(n^2) = 2log n.

In conclusion: yes, union-find helps a lot.

Even if you use the O(log n) variant of union-find, which would lead to O(m log m + m log n) total complexity, which you could assimilate to O(m log m), in practice you'd rather keep the second part faster if you can. Since union-find is very easy to implement, there's really no reason not to.

like image 130
IVlad Avatar answered Nov 15 '22 07:11

IVlad