Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest implementation for All-pairs shortest paths problem?

I have a weighted graph 30k nodes 160k edges, no negative weights. I would like to compute all the shortest paths from all the nodes to the others. I think I cannot assume any particular heuristics to simplify the problem.

I tried to use this Dijkstra C implementation http://compprog.wordpress.com/2007/12/01/one-source-shortest-path-dijkstras-algorithm/, that is for a single shortest path problem, calling the function dijkstras() for all my 30 nodes. As you can imagine, it takes ages. At the moment I don't have the time to write and debug the code by myself, I have to compute this paths as soon as possible and store them in a database so I am looking for another faster solution ready to use, do you have any tips?

I have to run it on a recent macbook pro with 8GB ram and I would like to find a solution that takes not more than 24 hours to finish the computation.

Thanks a lot in advance!!

Eugenio

like image 469
Eugenio Avatar asked Aug 23 '11 17:08

Eugenio


2 Answers

I looked over the Dijkstra's algorithm link that you posted in the comments and I believe that it's the source of your inefficiency. Inside the inner Dijkstra's loop, it's using an extremely unoptimized approach to determine which node to explore next (a linear scan over every node at each step). The problematic code is in two spots. The first is this code, which tries to find the next node to operate on:

mini = -1;
for (i = 1; i <= n; ++i)
    if (!visited[i] && ((mini == -1) || (d[i] < d[mini])))
         mini = i;

Because this code is nested inside of a loop that visits every node, the complexity (as mentioned in the link) is O(|V|2), where |V| is the number of nodes. In your case, since |V| is 30,000, there will be nine hundred million iterations of this loop overall. This is painfully slow (as you've seen), but there's no reason to have to do this much work.

Another trouble spot is here, which tries to find which edge in the graph should be used to reduce the cost of other nodes:

for (i = 1; i <= n; ++i)
   if (dist[mini][i])
       if (d[mini] + dist[mini][i] < d[i])
           d[i] = d[mini] + dist[mini][i];

This scans over an entire row in the adjacency matrix looking for nodes to consider, which takes time O(n) irrespective of how many outgoing edges leave the node.

While you could try fixing up this version of Dijkstra's into a more optimized implementation, I think the correct option here is just to throw this code away and find a better implementation of Dijkstra's algorithm. For example, if you use the pseudocode from the Wikipedia article implemented with a binary heap, you can get Dijkstra's algorithm running in O(|E| log |V|). In your case, this value is just over two million, which is about 450 times faster than your current approach. That's a huge difference, and I'm willing to bet that with a better Dijkstra's implementation you'll end up getting the code completing in a substantially shorter time than before.

On top of this, you might want to consider running all the Dijkstra searches in parallel, as Jacob Eggers has pointed out. This cam get you an extra speed boost for each processor that you have. Combined with the above (and more critical) fix, this should probably give you a huge performance increase.

If you plan on running this algorithm on a much denser data set (one where the number of edges approaches |V|2 / log |V|), then you may want to consider switching to the Floyd-Warshall algorithm. Running Dijkstra's algorithm once per node (sometimes called Johnson's algorithm) takes time O(|V||E| log |V|) time, while using Floyd-Warshall takes O(|V|3) time. However, for the data set you've mentioned, the graph is sufficiently sparse that running multiple Dijkstra's instances should be fine.

Hope this helps!

like image 159
templatetypedef Avatar answered Sep 27 '22 18:09

templatetypedef


How about the Floyd-Warshall algorithm?

like image 23
Aaron Klotz Avatar answered Sep 27 '22 16:09

Aaron Klotz