Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert performance with and without Index

Was doing a couple of tests.

Based on some great suggestions by Wes etc., I have tuned some of the neo4j properties with no cache to do insert on a large scale in a multithreaded environment and the performance is not bad.

However, when I introduce index (on the nodes), the performance degrades a lot. The difference is easily 5 fold. Are there configuration settings to make it better?

Thanks in advance,

Sachin

Neo4j version - 1.8.1; JVM - 1.6

like image 511
user2158600 Avatar asked Oct 04 '22 17:10

user2158600


1 Answers

Inserting nodes (or relationships) into a Lucene index is costly. Lucene is a powerful but complex tool, designed for fulltext/keyword search. Compared with the bare database, it is rather slow.

This is why most bulk insert tools do the indexing asynchronously, like Michael's batch inserter:

http://jexp.de/blog/2012/10/parallel-batch-inserter-with-neo4j/

Some even circumvent transactions, or write the store files directly:

http://blog.xebia.com/2012/11/13/combining-neo4j-and-hadoop-part-i/

To improve performance, using a SSD disk could help. But as Neo4j is a fully ACID transactional database, and the Lucene index is tightly coupled with the transactions (which is a good thing), there's not much else you can do besides optimizing your infrastructure for best write performance.

like image 104
Axel Morgner Avatar answered Oct 18 '22 09:10

Axel Morgner