Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the status on Neo4j's horizontal scalability project Rassilon?

Tags:

neo4j

bigdata

Just wondering if anyone has any information on the status of project Rassilon, Neo4j's side project which focuses on improving horizontal scalability of Neo4j?

It was first announced in January 2013 here.

I'm particularly interested in knowing more about when the graph size limitation will be removed and when sharding across clusters will become available.

like image 714
Mike Avatar asked Nov 25 '13 17:11

Mike


People also ask

Is Neo4j horizontally scalable?

With Neo4j, you can achieve unlimited horizontal scalability via sharding for mission-critical applications with a minutes-to-milliseconds performance advantage.

Does graph databases scale?

The Pipeline Ingest architecture provides a way to horizontally scale the ingest of data into the graph database by using a cloud-based queuing architecture. This architecture has been scaled to support the ingest of 1 billion nodes and edges per hour.


1 Answers

The node & relationship limits are going away in 2.1, which is the next release post 2.0 (which now has a release candidate).

Rassilon is definitely still in the mix. That said, that work is not taking precedence over things like the significant bundle of new features that are in 2.0. The reason is that Neo4j as it stands today is extremely capable of scaling, using the variety of architecture features outlined below (with some live examples):

www.neotechnology.com/neo4j-scales-for-the-enterprise/

There's lots of cleverness in the current architecture that allows the graph to perform & scale well without sharding. Because once you start sharding, you are destined to traverse over the network, which is a bad thing (for latency, query predictability etc.) So while there are some extremely large graphs that, largely for write throughput reasons, must trade off performance for uber scale (by sharding), the happy thing is that most graphs don't require this compromise. Sharding is required only in the 1% case, which means that nearly everyone can have their cake and eat it too. There are currently Neo4j clusters in production customers with 1B+ individuals in their graph, backing web applications with tens of millions of users. These use comparatively small (but very fast, very efficient) clusters. To give you some idea of the kinds of price-performance we regularly see: we've had users tell us that a single Neo4j instance could the same work as 10 Oracle instances, only faster.

A well-tuned Neo4j cluster can support upwards of 10K transactional writes per second, and an arbitrarily high number of reads per second. Read throughput scales linearly as instances are elastically plugged in. Cache sharding is a design pattern that ensures that you don't have to keep the entire graph in memory.

like image 91
Philip Rathle Avatar answered Sep 19 '22 14:09

Philip Rathle