I'm looking for best practices to handle data. So, that is what I got so far: 1.000.000 nodes of type "A". Every "A" node can be connected to 1-1000 nodes of type "B" and 1-10 nodes of type "C".
I've written a RESTful service (Java, Jersey) to import data into a neo4j graph. After the import of nodes "A" (only the nodes, with ids, no further data) i have notices that the neo4j db has grown to ~2.4GB.
Is it a good idea to store additional fields (name, description,...) in neo4j? Or should i set up a mongoDB/hadoop to use a key/value combination for data access?
Did you delete a lot of nodes during the insert? Normally a node takes 9 bytes on disk, so your 1M nodes should just take 9M bytes. You have to enable id reuse to aggressively reclaim memory.
Could you please list the content of your data directory with the file sizes?
In general it is no issue to put your other fields in neo4j if they are not large blob fields.
How did you create the db?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With