Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to find MST in a huge complete graph

Let's assume a complete graph of > 25000 nodes. Each node is essentially a point on a plane. It has 625M edges. Each edge has length which should be stored as a floating point number.

I need an algorithm to find its MST (on a usual PC).

If I take Kruskal's algorithm, it needs to sort all edges first, but I cannot afford even store the edges altogether in memory at the same time.

If I choose Prim's algorithm, it's quite difficult to evaluate how many edges will be stored in a heap at the same time, but probably the most of them will be there very soon after algorithm starts.

Is there any more memory-sufficient algorithm which can allow me to avoid sorting edges stored in a file?

Also, are there any known MST algorithms which utilize the fact that any tree edges satisfy triangle inequality?

like image 648
Roman Avatar asked Jul 03 '13 14:07

Roman


2 Answers

You can still use Kruskal's algorithm.

You don't actually need to sort the edges, what the algorithm requires is simply a method for repeatably finding the smallest weight edge that hasn't already been used. Presorting the edges and iterating through that list is simply a very efficient way of doing so.

You can do the same thing simply by repeatably find the k-smallest unused edges (where k is a manageable number, probably at least |V|), then sort and iterate through them instead as needed. This breaks the sorting process down into more manageable segments, although there is a time-space tradeoff as depending on how large k is the time complexity of this process can be anywhere from O(E log E) (k = E) to about O(E^2) (k = 1).

like image 51
Nuclearman Avatar answered Oct 13 '22 13:10

Nuclearman


Boruvka's algorithm makes a logarithmic number of passes on the unsorted edge list. The memory required is proportional to the number of nodes.

like image 24
David Eisenstat Avatar answered Oct 13 '22 12:10

David Eisenstat