Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Cassandra support sharding?

Does Apache Cassandra support sharding?

Apologize that this question must seem trivial, but I cannot seem to find the answer. I have read that Cassandra was partially modeled after GAE's Big Table which shards on a massive scale. But most of the documentation I'm currently finding on Cassandra seems to imply that Cassandra does not partition data horizontally across multiple machines, but rather supports many many duplicate machines. This would imply that Cassandra is a good fit high availability reads, but would eventually break down if the write volume became very very high.

like image 527
Chris Dutrow Avatar asked May 07 '13 20:05

Chris Dutrow


People also ask

Is sharding possible in NoSQL?

Sharding is a partitioning pattern for the NoSQL age. It's a partitioning pattern that places each partition in potentially separate servers—potentially all over the world. This scale out works well for supporting people all over the world accessing different parts of the data set with performance.

Is sharding the same as partitioning?

Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not. Partitioning is about grouping subsets of data within a single database instance.

What is sharding in Postgres?

In our exploratory scheme, each partition is a foreign table and physically lives in a separate database. We call this a "shard", which can also live in a totally separate database cluster. The PostgreSQL community has a roadmap to build sharding capabilities into native PostgreSQL in upcoming versions.

What are the advantages and disadvantages of Cassandra?

It's open-source. It follows peer-to-peer architecture rather than master-slave architecture, so there isn't a single point of failure. Cassandra can be easily scaled down or up. It features data replication, so it's fault-tolerant and has high availability.


1 Answers

Cassandra does partition across nodes (because if you can't split it you can't scale it). All of the data for a Cassandra cluster is divided up onto "the ring" and each node on the ring is responsible for one or more key ranges. You have control over the Partitioner (e.g. Random, Ordered) and how many nodes on the ring a key/column should be replicated to based on your requirements.

This contains a pretty good overview. Basic architecture

Also, I highly recommend reading the Dynamo white paper. While Cassandra is different than Dynamo in many ways, conceptually they stem from the same roots. Check it out: Dynamo White Paper

like image 130
Matt Self Avatar answered Sep 17 '22 17:09

Matt Self