Was doing a couple of tests.
Based on some great suggestions by Wes etc., I have tuned some of the neo4j properties with no cache to do insert on a large scale in a multithreaded environment and the performance is not bad.
However, when I introduce index (on the nodes), the performance degrades a lot. The difference is easily 5 fold. Are there configuration settings to make it better?
Thanks in advance,
Sachin
Neo4j version - 1.8.1; JVM - 1.6
Inserting nodes (or relationships) into a Lucene index is costly. Lucene is a powerful but complex tool, designed for fulltext/keyword search. Compared with the bare database, it is rather slow.
This is why most bulk insert tools do the indexing asynchronously, like Michael's batch inserter:
http://jexp.de/blog/2012/10/parallel-batch-inserter-with-neo4j/
Some even circumvent transactions, or write the store files directly:
http://blog.xebia.com/2012/11/13/combining-neo4j-and-hadoop-part-i/
To improve performance, using a SSD disk could help. But as Neo4j is a fully ACID transactional database, and the Lucene index is tightly coupled with the transactions (which is a good thing), there's not much else you can do besides optimizing your infrastructure for best write performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With