Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Spark Graphx have visualization like Gephi

I am new to graph world. I have been assigned to work on graph processing. Now I know Apache Spark, so thought of using it Graphx to process large graph. Then I came across Gephi provides nice GUI to manipulate graphs.

Does Graphx have such tools or it is mainly parallel graph processing library. Can I import json graph data came from Gephi into graphx?

like image 620
Umesh K Avatar asked Jun 18 '15 08:06

Umesh K


People also ask

What is Spark GraphX used for?

What is Spark GraphX? GraphX is the Spark API for graphs and graph-parallel computation. It includes a growing collection of graph algorithms and builders to simplify graph analytics tasks. GraphX extends the Spark RDD with a Resilient Distributed Property Graph.

What is unique feature of GraphX?

GraphX offers flexibility and works seamlessly with both graphs and collections. Hence, you can view the same data as graphs or collections. It unifies ETL, exploratory analysis, and iterative graph framework computation within a single system. It incorporates Spark data processing pipelines with graph processing.

Does Apache Spark support graph processing?

On top of the Core API, Spark offers an integrated set of high-level libraries that can be used for specialized tasks such as graph processing or machine learning.

How is GraphX different when compared to Giraph?

Apart from this, GraphX exhibits large performance variance. Giraph is more memory-efficient. The total memory required to run jobs on a fixed graph size is a few times lower for Giraph than for GraphX. As a consequence, Giraph requires fewer machine hours to process the same graph, making it more efficient overall.


1 Answers

Adding to that you can as well try Graphlab https://dato.com/products/create/open_source.html

It directly support Spark RDD https://dato.com/learn/userguide/data_formats_and_sources/spark_integration.html

Not much work required after that

from pyspark import SparkContext
import graphlab as gl

sc = SparkContext('yarn-client')

t = sc.textFile("hdfs://some/large/file")
sf = gl.SFrame.from_rdd(t)

# do stuff...

out_rdd = sf.to_rdd(sc)
like image 57
Abhishek Choudhary Avatar answered Sep 20 '22 13:09

Abhishek Choudhary