I am trying to implement Kruskal as efficient as possible.
For runtime efficiency, is there a difference in using a heap or a sorting algorithm for sorting the edges?
What other techniques are there for making the Kruskal algorithm work more efficiently?
Class KruskalElem is used to store the edges on the min-heap. Kruskal's algorithm is dominated by the time required to process the edges. The differ and UNION functions are nearly constant in time if path compression and weighted union is used.
Explanation: Kruskal's algorithm uses a greedy algorithm approach to find the MST of the connected weighted graph. In the greedy method, we attempt to find an optimal solution in stages.
The time of Kruskal algorithm is O(e log e) and its the time for sorting the edges. If you can do it in O(e), considering the rest of algorithm to find the minimum spanning tree is O (e log n), you have O(e) + O(e log n) . Since e=O(n^2) then the algorithm time would be O(n^2 log n) or O(e log n) .
The advantage of Prim's algorithm is its complexity, which is better than Kruskal's algorithm. Therefore, Prim's algorithm is helpful when dealing with dense graphs that have lots of edges. However, Prim's algorithm doesn't allow us much control over the chosen edges when multiple edges with the same weight occur.
It depends on the exact problem you are trying to solve. In case you are implementing a generic solution, just choose the 'fastest' sorting algorithm. I doubt that is heapsort though. I would just use whatever Java is using by default as a sorting algorithm (probably timsort, if you are sorting objects). More, in some cases sorting can be done faster than O(ElogE)
. Say your edges can only have integer weights residing in a small interval, then maybe you can go for something really similar to countsort. So if you are in one of those cases maybe a heap is far from a good choice. Also, I cannot see any reason why someone would use a heap in the context of Kruskal's algorithm solely.
To answer your second question (but you might know this already) a good speed up is given by the usage of Disjoint-set data structure for the operations on sets. It comes with all sort of advantages: easy to implement, good asymptotic behavior and low constant.
EDIT
I have reconsidered the heap/heapsort option, mainly due to the comments on my post. Using a heap might bring a huge advantage indeed if only sorting until the tree is complete. 180 degrees turn on my opinion. Here is the reason.
Consider the Erdős–Rényi model. Now, this is a very simplistic model in which one starts with an empty graph G
on n
vertices (i.e. no edges) and adds each possible edge with probability p
to G
, independent from any other edge. This is not exactly what Kruskal's algorithm does when composing the tree, but it resembles it 'pretty well' if G
has quadratic number of edges (in terms of the number of vertices), the edge distribution is not 'biased' and the weight assignment is not 'biased'.
Now here comes the interesting part. Under the Erdős–Rényi model, the graph becomes connected when p
is approximately ln(n)/n
(i.e. 'roughly' speaking, after adding O(nln(n))
edges to the graph). The result is well known for some time (check here).
Though, again, the setup is different for the Kruskal's algorithm, if G
has quadratic number of edges (in terms of the number of vertices), the edge distribution is not 'biased' and the weight assignment is not 'biased', it is plausible that a tree is reachable within O(nln(n))
edges. If this is true indeed, it makes using a heap and only sorting until the tree is complete better than sorting the entire set of edges using a comparison sort method before starting composing the tree.
So using a heap probably brings runtime speed up also and it might be considerable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With