Apache Giraph vs Neo4j : Are the traversal algorithms across nodes totally different in theses two graph processing systems ? If we were to traverse say a social graph using Giraph and Neo4j on data stored in single machine (not distributed) , which would perform better and Why?
Neo4j has the most popular and active graph database community. Reviews report that their product is easy to learn and easy to use with plenty of resources from training materials to books. Neo4j is well-established with loads of resources for their users.
Additionally, Neo4j has scalability weaknesses related to scaling writes, hence if your application is expected to have very large write throughputs, then Neo4j is not for you.
Neo4j DBMS: Clusters Neo4j supports clusters that provide high availability, scalability for read access to the data, and failover which is important to many enterprises. Neo4j clusters also maintain ACID transactions across all locations. Neo4j clusters are only available with Neo4j Enterprise Edition.
Hands down Neo4j. Giraph's graph computations run as Hadoop jobs, because they are meant to work for large distributed graphs. The overhead of managing these jobs is too large to be efficient on a small scale graph running on a pseudo-distributed single machine cluster.
Not only that, but Neo4j's specialty is traversals. A big reason for that is because Neo4j actually stores adjacent relationships in doubly linked lists in the filesystem. Check out this blog entry :
http://digitalstain.blogspot.nl/2010/10/neo4j-internals-file-storage.html
It explains the way Neo4j optimized the way they store the graph, for fast graph operations such as traversals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With