Architecture for a globally distributed Neo4j?

Tags:

I am doing some work for an organisation that has offices in 48 countries of the world. Essentially the way they work now is that they all store data in a local copy of the database and that is replicated out to all the regions/offices in the world. On the odd occasion where they need to work directly on something where the "development copy" is on the London servers, they have to connect directly to the London servers, regardless of where they are in the world.

So lets say I want to have a single graph spanning the whole organisation which is sharded so that each region has relatively fast reads of the graph. I am worried that writes are going to kill performance. I understand that writes go through a single master, does that mean there is a single master globally? i.e. if that master happens to be in London then each write to the database from Sydney has to traverse that distance regardless of the local sharding? And what would happen if Sydney and London were cut off (for whatever reason)?

Essentially, how does Neo4j solve the global distribution problem?

856

asked Sep 05 '13 05:09

gremwell

1 Answers

The distribution mechanism in Neo4j Enterprise edition is indeed master-slave style. Any write request to the master is committed locally and synchronously transferred to the number in slaves defined by push_factor (default: 1). A write request to a slave will synchronously apply it the master, to itself and to enough machines to fulfill push_factor. The synchrous slave-to-master communication might hit performance thats why it's recommended to do redirect writes to the master and distribute reads over slaves. The cluster communication works fine on high-latency networks.

In a multi-region setup I'd recommend to have a full (aka minimum 3 instances) cluster in the 'primary region'. Another 3-instance cluster is in a secondary region running in slave-only mode. In case that the primary region goes down completely (happens very rarly but it dows) the monitoring tool trigger a config change in the secondary region to enable its instances to become master. All other offices requiring fast read access have then x (x>=1, depending on read performance) slave-only instances. In each location you have a HA proxy (or other LB) that directs writes to the master (normally in primary region) and reads to the local region.

If you want to go beyond ~20 instances for a single cluster, consider doing a serious proof of concept first. Due to master slave architecture this approach does not scale indefinitly.

109

answered Oct 16 '22 21:10

Stefan Armbruster

Related questions
                            
                                Limiting number of nodes
                            
                                Neo4j Cypher query to find nodes that are not connected too slow
                            
                                How to decide between Neo4j causal vs HA clustering
                            
                                Perfomance SQL Server 2017 Graph vs Neo4j
                            
                                Combination of postgresql and neo4j for networking site
                            
                                How to unit test Neo4j in .Net?
                            
                                Copying the neo4j browser visualisations with d3.js
                            
                                Neo4j Match / Retrieving Query taking too much time 25 sec
                            
                                Neo4j cypher return all nodes where property matches any array element
                            
                                how to import json data in neo4j
                            
                                How to implement fuzzy search
                            
                                Access Neo4j in server mode with EmbeddedGraphDatabase?
                            
                                Is there a working solution for integrating Neo4j 1.6 in NodeJS?
                            
                                neo4j-shell "connection refused" error
                            
                                No such ServerPlugin: GremlinPlugin
                            
                                Export neo4j database in json file
                            
                                Cypher: analog of `sort -u` to merge 2 collections?
                            
                                how to find 2nd level of connections in neo4j?
                            
                                Cypher query to search a phrase in all properties
                            
                                Saving images into Neo4J

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Architecture for a globally distributed Neo4j?

Tags:

scaling

neo4j

distributed-system

gremwell

People also ask

1 Answers

Stefan Armbruster

Recent Activity

Donate For Us