Neo4j partition

Question

Is the a way to physically separate between neo4j partitions? Meaning the following query will go to node1:

Match (a:User:Facebook)

While this query will go to another node (maybe hosted on docker)

Match (b:User:Google)

this is the case: i want to store data of several clients under neo4j, hopefully lots of them. now, i'm not sure about whats is the best design for that but it has to fulfill few conditions:

no mixed data should be returned from a cypher query ( its really hard to make sure, that no developer will forget the ":Partition1" (for example) in a cypher query)
performance of 1 client shouldn't affect another client, for example, if 1 client has lots of data, and another client has small amount of data, or if a "heavy" query of 1 client is currently running, i dont want other "lite" queries of another client to suffer from slow slow performance

in other words, storing everything under 1 node, at some point in the future, i think, will have scalability problem, when i'll have more clients.

btw, is it common to have few clusters?

also whats the advantage of partitioning over creating different Label for each client? for example: Users_client_1 , Users_client_2 etc

FrobberOfBits · Accepted Answer

Short answer: no, there isn't.

Neo4j has high availability (HA) clusters where you can make a copy of your entire graph on many machines, and then serve many requests against that copy quickly, but they don't partition a really huge graph so some of it is stored here, some other parts there, and then connected by one query mechanism.

More detailed answer: graph partitioning is a hard problem, subject to ongoing research. You can read more about it over at wikipedia, but the gist is that when you create partitions, you're splitting your graph up into multiple different locations, and then needing to deal with the complication of relationships that cross partitions. Crossing partitions is an expensive operation, so the real question when partitioning is, how do you partition such that the need to cross partitions in a query comes up as infrequently as possible?

That's a really hard question, since it depends not only on the data model but on the access patterns, which may change.

Here's how bad the situation is (quote stolen):

Typically, graph partition problems fall under the category of NP-hard problems. Solutions to these problems are generally derived using heuristics and approximation algorithms.[3] However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor.[1] Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist,[4] unless P=NP. Grids are a particularly interesting case since they model the graphs resulting from Finite Element Model (FEM) simulations. When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs.

Not to leave you with too much doom and gloom, plenty of people have partitioned big graphs. Facebook and twitter do it every day, so you can read about FlockDB on the twitter side or avail yourself of relevant facebook research. But to summarize and cut to the chase, it depends on your data and most people who partition design a custom partitioning strategy, it's not something software does for them.

Finally, other architectures (such as Apache Giraph) can auto-partition in some senses; if you store a graph on top of hadoop, and hadoop already automagically scales across a cluster, then technically this is partitioning your graph for you, automagically. Cool, right? Well...cool until you realize that you still have to execute graph traversal operations all over the place, which may perform very poorly owing to the fact that all of those partitions have to be traversed, the performance situation you're usually trying to avoid by partitioning wisely in the first place.

Neo4j partition

Tags:

partitioning

neo4j

Lior Goldemberg

1 Answers

FrobberOfBits

Recent Activity

Donate For Us

Neo4j partition

Tags:

partitioning

neo4j

Lior Goldemberg

1 Answers

FrobberOfBits

Related questions

Recent Activity

Donate For Us