neo4j performance compared to mysql (how can it be improved?)

Tags:

This is a follow up to can't reproduce/verify the performance claims in graph databases and neo4j in action books. I have updated the setup and tests, and don't want to change the original question too much.

The whole story (including scripts etc) is on https://baach.de/Members/jhb/neo4j-performance-compared-to-mysql

Short version: while trying to verify the performance claims made in the 'Graph Database' book I came to the following results (querying a random dataset containing n people, with 50 friends each):

My results for 100k people  depth    neo4j             mysql       python  1        0.010             0.000        0.000 2        0.018             0.001        0.000 3        0.538             0.072        0.009 4       22.544             3.600        0.330 5     1269.942           180.143        0.758

"*": single run only

My results for 1 million people  depth    neo4j             mysql       python  1        0.010             0.000        0.000 2        0.018             0.002        0.000 3        0.689             0.082        0.012 4       30.057             5.598        1.079 5     1441.397*          300.000        9.791

"*": single run only

Using 1.9.2 on a 64bit ubuntu I have setup neo4j.properties with these values:

neostore.nodestore.db.mapped_memory=250M neostore.relationshipstore.db.mapped_memory=2048M

and neo4j-wrapper.conf with:

wrapper.java.initmemory=1024 wrapper.java.maxmemory=8192

My query to neo4j looks like this (using the REST api):

start person=node:node_auto_index(noscenda_name="person123") match (person)-[:friend]->()-[:friend]->(friend) return count(distinct friend);

Node_auto_index is in place, obviously

Is there anything I can do to speed neo4j up (to be faster then mysql)?

And also there is another benchmark in Stackoverflow with same problem.

929

asked Jul 23 '13 22:07

Joerg Baach

2 Answers

I'm sorry you can't reproduce the results. However, on a MacBook Air (1.8 GHz i7, 4 GB RAM) with a 2 GB heap, GCR cache, but no warming of caches, and no other tuning, with a similarly sized dataset (1 million users, 50 friends per person), I repeatedly get approx 900 ms using the Traversal Framework on 1.9.2:

public class FriendOfAFriendDepth4 {     private static final TraversalDescription traversalDescription =           Traversal.description()             .depthFirst()             .uniqueness( Uniqueness.NODE_GLOBAL )             .relationships( withName( "FRIEND" ), Direction.OUTGOING )             .evaluator( new Evaluator()             {                 @Override                 public Evaluation evaluate( Path path )                 {                     if ( path.length() >= 4 )                     {                         return Evaluation.INCLUDE_AND_PRUNE;                     }                     return Evaluation.EXCLUDE_AND_CONTINUE;                  }             } );      private final Index<Node> userIndex;      public FriendOfAFriendDepth4( GraphDatabaseService db )     {         this.userIndex = db.index().forNodes( "user" );     }      public Iterator<Path> getFriends( String name )     {         return traversalDescription.traverse(              userIndex.get( "name", name ).getSingle() )                 .iterator();     }      public int countFriends( String name )     {         return  count( traversalDescription.traverse(              userIndex.get( "name", name ).getSingle() )                  .nodes().iterator() );     } }

Cypher is slower, but nowhere near as slow as you suggest: approx 3 seconds:

START person=node:user(name={name}) MATCH (person)-[:FRIEND]->()-[:FRIEND]->()-[:FRIEND]->()-[:FRIEND]->(friend) RETURN count(friend)

Kind regards

ian

100

answered Oct 05 '22 16:10

Ian Robinson

Yes, I believe the REST API is significantly slower than the regular bindings and therein lies your performance problem.

answered Oct 05 '22 16:10

whistler

Related questions
                            
                                Python try/except: Showing the cause of the error after displaying my variables
                            
                                is there a way to track the number of times a function is called?
                            
                                redirect prints to log file
                            
                                How do I know if my list has all 1s?
                            
                                Imbalance in scikit-learn
                            
                                Python - 'ascii' codec can't decode byte
                            
                                Where can I find mad (mean absolute deviation) in scipy?
                            
                                Rotating an image with orientation specified in EXIF using Python without PIL including the thumbnail
                            
                                In Python 2.4, how can I strip out characters after ';'?
                            
                                Python "Every Other Element" Idiom [duplicate]
                            
                                How to generate a random 4 digit number not starting with 0 and having unique digits?
                            
                                Create a List that contain each Line of a File
                            
                                How might I remove duplicate lines from a file?
                            
                                Extract Google Drive zip from Google colab notebook
                            
                                Detect 64bit OS (windows) in Python
                            
                                How to implement a Median-heap
                            
                                psycopg2 installation error - Library not loaded: libssl.dylib
                            
                                How to prevent a function from being overridden in python [duplicate]
                            
                                Why does "www".count("ww") return 1 and not 2? [duplicate]
                            
                                How to create a letter spacing attribute with pycairo?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

neo4j performance compared to mysql (how can it be improved?)

Tags:

performance

python

mysql

neo4j

Joerg Baach

People also ask

2 Answers

Ian Robinson

whistler

Recent Activity

Donate For Us