Why do "joins" reduce scalability in large-scale distributed database system?

Question

I wonder how and why "join" reduce scalability in large-scale distributed (relational) database system?

Thanks.

alphazero · Accepted Answer

As a general consideration, there is significant overhead (e.g. non-user computation) in a distributed system that present a 'coherent' and 'unified' facade.

Simply consider these factors:

distinct nodes (e.g. servers) are distinct machines. This means the probability of having n nodes participating in a distributed action -- e.g. a join -- being in an optimal state (e.g. having just the right tables in cache, or having the appropriate locks acquired) is low. So here is some of the overhead for each node to get in the appropriate state.
naturally they need to communicate to coordinate. So there is network chatter between nodes and those latencies are not insignificant.
above overheads, in turn, increase the average time of servicing requests, and thus reduce availability (in terms of system capacity).

Scalability becomes an issue as none of the above are O(1). At the very best you can expect O(log n) and it could be as bad as O(n^2). That does wonders for killing scalability (which by definition means the ability of the system to scale to a larger number of nodes).

The above are a part of the motivation for noSQL systems, e.g. if one does not require coordination across nodes to service queries, then the performance is substantially better. (As you can see, it is not magic -- we're merely sacrificing systemic correctness for performance.)

Why do "joins" reduce scalability in large-scale distributed database system?

Tags:

database

distributed

janetsmith

1 Answers

alphazero

Recent Activity

Donate For Us

Why do "joins" reduce scalability in large-scale distributed database system?

Tags:

database

distributed

janetsmith

1 Answers

alphazero

Related questions

Recent Activity

Donate For Us