Neo4j or GraphX / Giraph what to choose?

Tags:

Just started my excursion to graph processing methods and tools. What we basically do - count some standard metrics like pagerank, clustering coefficient, triangle count, diameter, connectivity etc. In the past was happy with Octave, but when we started to work with graphs having let's say 10^9 nodes/edges we stuck.

So the possible solutions can be distributed cloud made with Hadoop/Giraph, Spark/GraphX, Neo4j on top of them, etc.

But since I am a beginner, can someone advise what actually to choose? I did not get the difference when to use Spark/GraphX and when Neo4j? Right now I consider Spark/GraphX, since it have more Python alike syntax, while neo4j has the own Cypher. Visualization in neo4j is cool but not useful in such a large scale. I do not understand is there a reason to use additional level of software (neo4j) or just use Spark/GraphX? Since I understood neo4j will not save so much time like if we worked with pure hadoop vs Giraph or GraphX or Hive.

Thank you.

420

asked Feb 19 '15 14:02

Roman

1 Answers

Neo4J: It is a graphical database which helps out identifying the relationships and entities data usually from the disk. It's popularity and choice is given in this link. But when it needs to process the very large data-sets and real time processing to produce the graphical results/representation it needs to scale horizontally. In this case combination of Neo4J with Apache Spark will give significant performance benefits in such a way Spark will serve as an external graph compute solution.

Mazerunner is a distributed graph processing platform which extends Neo4J. It uses message broker to process distribute graph processing jobs to Apache Spark GraphX module.

GraphX: GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. It supports multiple Graph algorithms.

Conclusion: It is always recommended to use the Hybrid combination of Neo4j with GraphX as they both easier to integrate.

For real time processing and processing large data-sets, use neo4j with GraphX.
For simple persistence and to show the entity relationship for a simple graphical display representation use standalone neo4j.

185

answered Oct 18 '22 19:10

Praveen Kumar K S

Related questions
                            
                                Elixir - Call private function dynamically
                            
                                how to bind inverse boolean, JavaFX
                            
                                add watch on a non scope variable in angularjs
                            
                                Disable :hover on click
                            
                                Windows 10 UAP back button
                            
                                Mantis Bug Tracker - What is default Admin password after fresh installation?
                            
                                Mp4 to HLS using ffmpeg
                            
                                Why does 'typeof enum constant' generate a warning when compared to a variable of enum type?
                            
                                What is the name parameter in Pandas Series?
                            
                                Best practice for polling an AWS SQS queue and deleting received messages from queue?
                            
                                RecyclerView element update + async network call
                            
                                Call python function from JS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With