how to get two-hop neighbors in spark-graphx?

Tags:

apache-spark

spark-graphx

I've created a directed graph, using graphx.

#src->dest
a  -> b  34
a  -> c  23
b  -> e  10
c  -> d  12
d  -> c  12
c  -> d  11

I want to get all two hop neighbors like this:

a  -> e  44
a  -> d  34

My graph is very large, so I would like to do it elegantly and efficiently. Does anyone have any advice on what will be the best way to do that over a graph instance?

376

asked Oct 08 '16 04:10

leslie chu

1 Answers

You can succinctly express this using GraphFrames library. First you have to include required package. For with Spark 2.0 and Scala 2.11 you can add

graphframes:graphframes:0.2.0-spark2.0-s_2.11

to spark.jars.packages in conf/spark-defaults.conf or pass it as --packages argument for spark-submit.

Next you should convert Graph to GraphFrame. You can use fromGraphX method:

import org.graphframes.GraphFrame
import org.apache.spark.graphx._

val nodes = sc.parallelize(Seq(
  (1L, "a"), (2L, "b"), (3L, "c"), (4L, "d"), (5L, "e")))

val edges = sc.parallelize(Seq(
   Edge(1L, 2L, 34), Edge(1L, 3L, 23), Edge(2L, 5L, 10),
   Edge(3L, 4L, 12), Edge(3L, 3L, 12), Edge(3L, 5L, 11)))

val graph = Graph(nodes, edges)

val graphFrame = GraphFrame.fromGraphX(graph)

GraphFrame provides find method which takes a pattern in a language similar to Cypher. Two-hops can be expressed as:

val pattern = "(x1) - [a] -> (x2); (x2) - [b] -> (x3)"

where (_) represents nodes, and [_] edges. You paths matching the pattern:

val paths = graphFrame.find(pattern)

and select fields:

paths.select($"x1.attr", $"x3.attr", $"a.attr" + $"b.attr").show()

132

answered Sep 22 '22 17:09

zero323

Related questions
                            
                                Apache Spark Executors Dead - is this the expected behaviour?
                            
                                Spark concurrent writes on same HDFS location
                            
                                Kappa architecture: when insert to batch/analytic serving layer happens
                            
                                403 Error while accessing s3a using Spark
                            
                                AWS EMR: Pyspark: Rdd: mappartitions: Could not find valid SPARK_HOME while searching: Spark closures
                            
                                saveAsTextFile method in spark
                            
                                Connect to spark through a SOCKS proxy
                            
                                How do I submit a Spark jar to a EMR cluster?
                            
                                Where to download documentation for Spark?
                            
                                SparkR Error in sparkR.init(master="local") in RStudio
                            
                                Multiple IP addresses and Host Names used by Spark Driver and Master
                            
                                java.util.concurrent.RejectedExecutionException in Spark although driver/client has precisely same version as Server
                            
                                Writing an RDD to multiple files in PySpark
                            
                                Can sample weight be used in Spark MLlib Random Forest training?
                            
                                Manually stopping Spark Workers
                            
                                Spark Streaming: Broadcast variables, java.lang.ClassCastException
                            
                                How to run custom Python script on Jupyter Notebook launch (to boot Spark)?
                            
                                saveToCassandra with spark-cassandra connector throws java.lang.ClassCastException
                            
                                How to load a PMML model?
                            
                                How to distribute xgboost module for use in spark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With