Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j node creation speed

Tags:

neo4j

I have a fresh neo4j setup on my laptop, and creating new nodes via the REST API seems to be quite slow (~30-40 ms average). I've Googled around a bit, but can't find any real benchmarks for how long it "should" take; there's this post, but that only lists relative performance, not absolute performance. Is neo4j inherently limited to only adding ~30 new nodes per second (outside of batch mode), or is there something wrong with my configuration?

Config details:

  • Neo4j version 2.2.5
  • Server is on my mid-end 2014 laptop, running Ubuntu 15.04
  • OpenJDK version 1.8
  • Calls to the server are also from my laptop (via localhost:7474), so there shouldn't be any network latency involved
  • I'm calling neo4j via Clojure/Neocons; method used is "create" in the class clojurewerkz.neocons.rest.nodes
  • Using Cypher seems to be even slower; eg. calling "PROFILE CREATE (you:Person {name:"Jane Doe"}) RETURN you" via the HTML interface returns "Cypher version: CYPHER 2.2, planner: RULE. 5 total db hits in 54 ms."
like image 409
Alyssa Vance Avatar asked Sep 23 '15 01:09

Alyssa Vance


People also ask

How can I make my Neo4j faster?

The size of the available heap memory is an important aspect for the performance of Neo4j. Generally speaking, it is beneficial to configure a large enough heap space to sustain concurrent operations. For many setups, a heap size between 8G and 16G is large enough to run Neo4j reliably.

How many nodes can be created in Neo4j?

Create a full path This query creates three nodes and two relationships in one go, assigns it to a path variable, and returns it.

What are the weaknesses of Neo4j?

Additionally, Neo4j has scalability weaknesses related to scaling writes, hence if your application is expected to have very large write throughputs, then Neo4j is not for you.


1 Answers

Neo4j performance charasteristics is a tricky area.

Mesuring performance

First of all: it all depends a lot on how server is configured. Measuring anything on laptop is wrong way to do it.

Befor measuring performance you should check following:

  1. You have appropriate server hardware (requirements)
  2. Client and server are in local network.
  3. Neo4j is properly configured (memory mapping, webserver thread pool, java heap size and etc)
  4. Server is properly configured (Linux tcp stack, maximum available open files and etc)
  5. Server is warmed up. Neo4j is written in Java, so you should do appropriate warmup before measuring numbers (i.e. making some load for ~15 minutes).

And last one - enterprise edition. Neo4j enterprise edition has some advanced features that can improve performance a lot (i.e. HPC cache).

Neo4j internally

Neo4j internally is:

  • Storage
  • Core API
  • Traversal API
  • Cypher API

Everything is performed without any additional network requests. Neo4j server is build on top of this solid foundation.

So, when you are making request to Neo4j server, you are measuring:

  • Latency between client and server
  • JSON serialization costs
  • Web server (Jetty)
  • Additional modules that are intended for managing locks, transaction and etc
  • And Neo4j itself

So, bottom line here is - Neo4j is pretty fast by itself, if used in embedded mode. But dealing with Neo4j server involved additional costs.

Numbers

We had internal Neo4j testing. We measured several cases.

Create nodes

Here we are using vanilla Transactional Cypher REST API.

Threads: 2

Node per transaction: 1000  
Execution time: 1635  
Total nodes created: 7000000  
Nodes per second: 7070  

Threads: 5

Node per transaction: 750  
Execution time: 852  
Total nodes created: 7000000  
Nodes per second: 8215  

Huge database sync

This one uses custom developed unmanaged extension, with binary protocol between server and client and some concurrency.

But this is still Neo4j server (in fact - Neo4j cluster).

Node count: 80.32M (80 320 000)
Relationship count: 80.30M (80 300 000)
Property count: 257.78M (257 780 000)
Consumed time: 2142 seconds

Per second:
Nodes - 37497
Relationships - 37488
Properties - 120345

This numbers shows true Neo4j power.

My numbers

I tried to measure performance right now

Fresh and unconfigured database (2.2.5), Ubuntu 14.04 (VM).

Results:

$ ab -p post_loc.txt -T application/json -c 1 -n 10000 http://localhost:7474/db/data/node
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        Jetty(9.2.4.v20141103)
Server Hostname:        localhost
Server Port:            7474

Document Path:          /db/data/node
Document Length:        1245 bytes

Concurrency Level:      1
Time taken for tests:   14.082 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      14910000 bytes
Total body sent:        1460000
HTML transferred:       12450000 bytes
Requests per second:    710.13 [#/sec] (mean)
Time per request:       1.408 [ms] (mean)
Time per request:       1.408 [ms] (mean, across all concurrent requests)
Transfer rate:          1033.99 [Kbytes/sec] received
                        101.25 kb/s sent
                        1135.24 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0      19
Processing:     1    1   1.3      1      53
Waiting:        0    1   1.2      1      53
Total:          1    1   1.3      1      54

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      2
  95%      2
  98%      3
  99%      4
 100%     54 (longest request)

This one creates 10000 nodes using REST API, with no properties in 1 thread.

As you can see, event on my laptop in Linux VM, with default settings - Neo4j is able to create nodes in 4ms or less (99%).

Note: I have warmed up database before (created and deleted 100K nodes).

Bolt

If you are looking for best Neo4j performance, you should follow Bolt development. This is new binary protocol for Neo4j server.

More info: here, here and here.

like image 89
FylmTM Avatar answered Oct 02 '22 07:10

FylmTM