Just wondering if anyone has any information on the status of project Rassilon, Neo4j's side project which focuses on improving horizontal scalability of Neo4j?
It was first announced in January 2013 here.
I'm particularly interested in knowing more about when the graph size limitation will be removed and when sharding across clusters will become available.
With Neo4j, you can achieve unlimited horizontal scalability via sharding for mission-critical applications with a minutes-to-milliseconds performance advantage.
The Pipeline Ingest architecture provides a way to horizontally scale the ingest of data into the graph database by using a cloud-based queuing architecture. This architecture has been scaled to support the ingest of 1 billion nodes and edges per hour.
The node & relationship limits are going away in 2.1, which is the next release post 2.0 (which now has a release candidate).
Rassilon is definitely still in the mix. That said, that work is not taking precedence over things like the significant bundle of new features that are in 2.0. The reason is that Neo4j as it stands today is extremely capable of scaling, using the variety of architecture features outlined below (with some live examples):
www.neotechnology.com/neo4j-scales-for-the-enterprise/
There's lots of cleverness in the current architecture that allows the graph to perform & scale well without sharding. Because once you start sharding, you are destined to traverse over the network, which is a bad thing (for latency, query predictability etc.) So while there are some extremely large graphs that, largely for write throughput reasons, must trade off performance for uber scale (by sharding), the happy thing is that most graphs don't require this compromise. Sharding is required only in the 1% case, which means that nearly everyone can have their cake and eat it too. There are currently Neo4j clusters in production customers with 1B+ individuals in their graph, backing web applications with tens of millions of users. These use comparatively small (but very fast, very efficient) clusters. To give you some idea of the kinds of price-performance we regularly see: we've had users tell us that a single Neo4j instance could the same work as 10 Oracle instances, only faster.
A well-tuned Neo4j cluster can support upwards of 10K transactional writes per second, and an arbitrarily high number of reads per second. Read throughput scales linearly as instances are elastically plugged in. Cache sharding is a design pattern that ensures that you don't have to keep the entire graph in memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With