Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

neo4j "empty" database takes up a lot of disk space

I've inserted ~2M nodes (via Java API), and deleted them after a day or two of usage (through java too). Now my db has got 16k nodes, and weights 6 GB.

Why this space wasn't freed?

What may be the cause?

like image 334
m.cichacz Avatar asked Jun 23 '14 14:06

m.cichacz


People also ask

What are the disadvantages of Neo4j?

Neo4j has some upper bound limit for the graph size and can support tens of billions of nodes, properties, and relationships in a single graph. No security is provided at the data level and there is no data encryption. Security auditing is not available in Neo4j.

Where is Neo4j database stored?

Neo4j database files are persisted to storage for long term durability. Data related files located in data/databases/graph. db (v3. x+) by default in the Neo4j data directory.

How do I delete everything on Neo4j?

Deleting all nodes and relationships in a Neo4j database is very simple. Here is an example that does just that: MATCH (n) DETACH DELETE n; The DETACH keyword specifies to remove or “detach” all relationships from a particular node before deletion.


2 Answers

The data/graph.db directory contains multiple items:

  • Store itself, split into multiple files
  • Indexes
  • Transaction log files
  • Log files (messages.log)

All your operations are stored in the transaction logs and then expire according to the keep_logical_logs setting. Not sure what the default value is, by I presume that you might have quite some space in use there.

I'd suggest to check what is taking up the space.

Also, we have sometimes seen that the space in use (reported with du for example) differs when Neo4j is running and stopped.

like image 191
albertoperdomo Avatar answered Oct 03 '22 15:10

albertoperdomo


In addition to Alberto's answer, the store is not compacted. It leaves the empty records for reuse, and they will stay there forever. As far as I know, there is no available tool to compact the store (I've considered writing one myself, but usually convince myself that there aren't that many use cases affected by this).

If you do have a lot of churn where you are inserting and deleting records often, it's a good idea to restart your database often so it will reuse the records that it has marked as deleted.

As Alberto mentions, one of the first things I set (the other being the heap size) when I install a new neo4j is the keep_logical_logs to something like 1-7 days. If you let them grow forever (the default), they will get quite large.

like image 23
Eve Freeman Avatar answered Oct 03 '22 14:10

Eve Freeman