Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling big data sets (neo4j, mongo db, hadoop)

I'm looking for best practices to handle data. So, that is what I got so far: 1.000.000 nodes of type "A". Every "A" node can be connected to 1-1000 nodes of type "B" and 1-10 nodes of type "C".

I've written a RESTful service (Java, Jersey) to import data into a neo4j graph. After the import of nodes "A" (only the nodes, with ids, no further data) i have notices that the neo4j db has grown to ~2.4GB.

Is it a good idea to store additional fields (name, description,...) in neo4j? Or should i set up a mongoDB/hadoop to use a key/value combination for data access?

like image 285
Alebon Avatar asked Nov 04 '22 11:11

Alebon


1 Answers

Did you delete a lot of nodes during the insert? Normally a node takes 9 bytes on disk, so your 1M nodes should just take 9M bytes. You have to enable id reuse to aggressively reclaim memory.

Could you please list the content of your data directory with the file sizes?

In general it is no issue to put your other fields in neo4j if they are not large blob fields.

How did you create the db?

like image 50
Michael Hunger Avatar answered Nov 09 '22 09:11

Michael Hunger