Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there this Capacity Limit on Nodes and Relationships in neo4j?

Tags:

neo4j

I wonder why neo4j has a Capacity Limit on Nodes and Relationships. The limit on Nodes and Relationships is 2^35 1 which is a "little" bit more then the "normal" 2^32 integer. Common SQL Databases for example mysql stores there primary key as int(2^32) or bigint(2^64)2. Can you explain me the advantages of this decision? In my opinion this is a key decision point when choosing a database.

like image 843
Johannes Avatar asked Oct 19 '12 09:10

Johannes


People also ask

What are the limitations of Neo4j?

Neo4j has some upper bound limit for the graph size and can support tens of billions of nodes, properties, and relationships in a single graph. No security is provided at the data level and there is no data encryption. Security auditing is not available in Neo4j.

How many nodes a single relationship can connect?

Relationships. Relationships organize the nodes by connecting them. A relationship connects two nodes — a start node and an end node. Just like nodes, relationships can have properties.

How does Neo4j store relationships?

Properties are stored as a linked list of property records, each holding a key and value and pointing to the next property. Each node and relationship references its first property record. The Nodes also reference the first relationship in its relationship chain. Each Relationship references its start and end node.


1 Answers

It is an artificial limit. They are going to remove it in the not-too-distant future, although I haven't heard any official ETA.

Often enough, you run into hardware limits on a single machine before you actually hit this limit.

The current option is to manually shard your graphs to different machines. Not ideal for some use cases, but it works in other cases. In the future they'll have a way to shard data automatically--no ETA on that either.

Update: I've learned a bit more about neo4j storage internals. The reason the limits are what they are exactly, are because the id numbers are stored on disk as pointers in several places (node records, relationship records, etc.). To increase it by another power of 2, they'd need to increase 1 byte per node and 1 byte per relationship--it is currently packed as far as it will go without needing to use more bytes on disk. Learn more at this great blog post: http://digitalstain.blogspot.com/2010/10/neo4j-internals-file-storage.html

Update 2:
I've heard that in 2.1 they'll be increasing these limits to around another order of magnitude higher than they currently are.

like image 173
Eve Freeman Avatar answered Oct 11 '22 19:10

Eve Freeman