Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is is possible to implemet all-pairs shortest path algorithm with parallel framework in large graph?

With spark graphx pregel api, it is easy to compute single source shortest path in large graph, for example millions vertices and tens millions edges, with acceptable running time, for example sevral hours. But is it possible to run all-pairs shortest path in large graph in acceptable running time?

like image 558
bourneli Avatar asked May 12 '16 02:05

bourneli


1 Answers

Graphs having millions of vertices can be easily processed on single machine, as long as it has sufficient memory, so there is no need to pay the penalty introduced by scaling out and many modern libraries, are heavily optimized and can utilize modern hardware.

In contrast, distributed solutions are usually limited by inter-node communication and exact algorithms just don't scale well. It is possible to improve things significantly with approximations and heuristics and leveraging a priori knowledge about the structure of data.

(Opinion alert) Personally I would steer away as far from possible from graph processing on Spark:

  • GraphX has been effectively abandoned few years ago. It also shows very poor scaling capabilities, according to Facebook's study
  • Grapframes are immature and inefficient.
like image 181
user8924182 Avatar answered Jan 02 '23 13:01

user8924182