Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all the nodes connected to a node in Apache Spark GraphX

Suppose we have got the input in Apache GraphX as :

Vertex RDD:

val vertexArray = Array(
  (1L, "Alice"),
  (2L, "Bob"),
  (3L, "Charlie"),
  (4L, "David"),
  (5L, "Ed"),
  (6L, "Fran")
)

Edge RDD:

val edgeArray = Array(
  Edge(1L, 2L, 1),
  Edge(2L, 3L, 1),
  Edge(3L, 4L, 1),
  Edge(5L, 6L, 1)
)

I need all the components connected to a node in Apache Spark GraphX

1,[1,2,3,4]
5,[5,6]
like image 596
Ajay Gupta Avatar asked Sep 16 '15 02:09

Ajay Gupta


1 Answers

You can use ConnectedComponents which returns

a graph with the vertex value containing the lowest vertex id in the connected component containing that vertex.

and reshape results

graph.connectedComponents.vertices.map(_.swap).groupByKey
like image 166
zero323 Avatar answered Nov 04 '22 22:11

zero323