Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

neo4j performance compared to mysql (how can it be improved?)

This is a follow up to can't reproduce/verify the performance claims in graph databases and neo4j in action books. I have updated the setup and tests, and don't want to change the original question too much.

The whole story (including scripts etc) is on https://baach.de/Members/jhb/neo4j-performance-compared-to-mysql

Short version: while trying to verify the performance claims made in the 'Graph Database' book I came to the following results (querying a random dataset containing n people, with 50 friends each):

My results for 100k people  depth    neo4j             mysql       python  1        0.010             0.000        0.000 2        0.018             0.001        0.000 3        0.538             0.072        0.009 4       22.544             3.600        0.330 5     1269.942           180.143        0.758 

"*": single run only

My results for 1 million people  depth    neo4j             mysql       python  1        0.010             0.000        0.000 2        0.018             0.002        0.000 3        0.689             0.082        0.012 4       30.057             5.598        1.079 5     1441.397*          300.000        9.791 

"*": single run only

Using 1.9.2 on a 64bit ubuntu I have setup neo4j.properties with these values:

neostore.nodestore.db.mapped_memory=250M neostore.relationshipstore.db.mapped_memory=2048M 

and neo4j-wrapper.conf with:

wrapper.java.initmemory=1024 wrapper.java.maxmemory=8192 

My query to neo4j looks like this (using the REST api):

start person=node:node_auto_index(noscenda_name="person123") match (person)-[:friend]->()-[:friend]->(friend) return count(distinct friend); 

Node_auto_index is in place, obviously

Is there anything I can do to speed neo4j up (to be faster then mysql)?

And also there is another benchmark in Stackoverflow with same problem.

like image 929
Joerg Baach Avatar asked Jul 23 '13 22:07

Joerg Baach


People also ask

Is Neo4j faster than MySQL?

For the simple friends of friends query, Neo4j is 60% faster than MySQL. For friends of friends of friends, Neo is 180 times faster. And for the depth four query, Neo4j is 1,135 times faster.

Is Neo4j better than SQL?

Neo4j ist not generally faster than an SQL database. It is just in many cases faster for graph based problems. For example if you'd like to find the shortest path between two entities Neo4j will most likely outperform MySQL etc.

How efficient is Neo4j?

Conclusions. In this paper, we propose a domain ontology building process based on the Neo4j graphics database and a retrieval method based on a two-tier index architecture. Our assessment shows that our approach can save 13.04% of the storage space and is 30 times more efficient compared to relational databases.

Are graph databases faster than relational databases?

Complex queries typically run faster in graph databases than they do in relational databases. Relational databases require complex joins on data tables to perform complex queries, so the process is not as fast.


2 Answers

I'm sorry you can't reproduce the results. However, on a MacBook Air (1.8 GHz i7, 4 GB RAM) with a 2 GB heap, GCR cache, but no warming of caches, and no other tuning, with a similarly sized dataset (1 million users, 50 friends per person), I repeatedly get approx 900 ms using the Traversal Framework on 1.9.2:

public class FriendOfAFriendDepth4 {     private static final TraversalDescription traversalDescription =           Traversal.description()             .depthFirst()             .uniqueness( Uniqueness.NODE_GLOBAL )             .relationships( withName( "FRIEND" ), Direction.OUTGOING )             .evaluator( new Evaluator()             {                 @Override                 public Evaluation evaluate( Path path )                 {                     if ( path.length() >= 4 )                     {                         return Evaluation.INCLUDE_AND_PRUNE;                     }                     return Evaluation.EXCLUDE_AND_CONTINUE;                  }             } );      private final Index<Node> userIndex;      public FriendOfAFriendDepth4( GraphDatabaseService db )     {         this.userIndex = db.index().forNodes( "user" );     }      public Iterator<Path> getFriends( String name )     {         return traversalDescription.traverse(              userIndex.get( "name", name ).getSingle() )                 .iterator();     }      public int countFriends( String name )     {         return  count( traversalDescription.traverse(              userIndex.get( "name", name ).getSingle() )                  .nodes().iterator() );     } } 

Cypher is slower, but nowhere near as slow as you suggest: approx 3 seconds:

START person=node:user(name={name}) MATCH (person)-[:FRIEND]->()-[:FRIEND]->()-[:FRIEND]->()-[:FRIEND]->(friend) RETURN count(friend) 

Kind regards

ian

like image 100
Ian Robinson Avatar answered Oct 05 '22 16:10

Ian Robinson


Yes, I believe the REST API is significantly slower than the regular bindings and therein lies your performance problem.

like image 32
whistler Avatar answered Oct 05 '22 16:10

whistler