Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can graph databases scale horizontally, if at all?

With key-value, document, and column-family databases, I understand you can scale out with combinations of replication and sharding in the keyspace. But, with common graph operations like shortest path, etc. -- these don't really seem to gain any benefit from replication...and I can't see how you would shard a graph database without finding an independent subgraph (very difficult).

Are there graph databases that try to tackle this problem? What is the current research in this area?

like image 214
dougvk Avatar asked Jul 20 '11 14:07

dougvk


People also ask

Are graph databases horizontally scalable?

A graph database that distributes its data should be able to connect two vertices with an edge even when those two vertices are stored on two different servers. Distribution pairs well with horizontal scalability.

Can Neo4j scale horizontally?

Limitless Horizontal Scaling With Neo4j, you can achieve unlimited horizontal scalability via sharding for mission-critical applications with a minutes-to-milliseconds performance advantage.

Why do graph databases struggle with scaling out?

12. Explain why graph databases tend to struggle with scaling out. Graph databases do not scale very well to clusters as they specialize in highly related data, not independent pieces of data.

What is database horizontal scaling?

What is Horizontal Scaling? Horizontal scaling, also known as scale-out, refers to bringing on additional nodes to share the load. This is difficult with relational databases due to the difficulty in spreading out related data across nodes.


1 Answers

Replication can be useful for any kind of database - it's just creating multiple copies of data so you can serve more queries than a single server can handle.

Sharding is a little more complex, but actually not too different from key/value or document stores, because internally edges have to be represented as simple lists.

While it is true that finding independent subgraphs is impossible in most cases, it isn't actually necessary. As long as the node processing the query is able to get data from other nodes, having the data available locally is just a performance optimisation.

Once you have that set up, you have a lot of options for optimising performance based on the type of graph you are working with - for example in a social graph you might use location to select the node for a user because you know that most connections are local.

I'm not aware of any existing graph databases that have sharding built in, probably because the problem is much harder to solve for the general case and the small size of edge data means you need a really large graph to exceed the capacity of a single server.

like image 170
Tom Clarkson Avatar answered Sep 18 '22 15:09

Tom Clarkson