With spark graphx pregel api, it is easy to compute single source shortest path in large graph, for example millions vertices and tens millions edges, with acceptable running time, for example sevral hours. But is it possible to run all-pairs shortest path in large graph in acceptable running time?
Graphs having millions of vertices can be easily processed on single machine, as long as it has sufficient memory, so there is no need to pay the penalty introduced by scaling out and many modern libraries, are heavily optimized and can utilize modern hardware.
In contrast, distributed solutions are usually limited by inter-node communication and exact algorithms just don't scale well. It is possible to improve things significantly with approximations and heuristics and leveraging a priori knowledge about the structure of data.
(Opinion alert) Personally I would steer away as far from possible from graph processing on Spark:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With