I am new to graph world. I have been assigned to work on graph processing. Now I know Apache Spark, so thought of using it Graphx to process large graph. Then I came across Gephi provides nice GUI to manipulate graphs.
Does Graphx have such tools or it is mainly parallel graph processing library. Can I import json graph data came from Gephi into graphx?
What is Spark GraphX? GraphX is the Spark API for graphs and graph-parallel computation. It includes a growing collection of graph algorithms and builders to simplify graph analytics tasks. GraphX extends the Spark RDD with a Resilient Distributed Property Graph.
GraphX offers flexibility and works seamlessly with both graphs and collections. Hence, you can view the same data as graphs or collections. It unifies ETL, exploratory analysis, and iterative graph framework computation within a single system. It incorporates Spark data processing pipelines with graph processing.
On top of the Core API, Spark offers an integrated set of high-level libraries that can be used for specialized tasks such as graph processing or machine learning.
Apart from this, GraphX exhibits large performance variance. Giraph is more memory-efficient. The total memory required to run jobs on a fixed graph size is a few times lower for Giraph than for GraphX. As a consequence, Giraph requires fewer machine hours to process the same graph, making it more efficient overall.
Adding to that you can as well try Graphlab https://dato.com/products/create/open_source.html
It directly support Spark RDD https://dato.com/learn/userguide/data_formats_and_sources/spark_integration.html
Not much work required after that
from pyspark import SparkContext
import graphlab as gl
sc = SparkContext('yarn-client')
t = sc.textFile("hdfs://some/large/file")
sf = gl.SFrame.from_rdd(t)
# do stuff...
out_rdd = sf.to_rdd(sc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With