Does Apache Cassandra support sharding?
Apologize that this question must seem trivial, but I cannot seem to find the answer. I have read that Cassandra was partially modeled after GAE's Big Table which shards on a massive scale. But most of the documentation I'm currently finding on Cassandra seems to imply that Cassandra does not partition data horizontally across multiple machines, but rather supports many many duplicate machines. This would imply that Cassandra is a good fit high availability reads, but would eventually break down if the write volume became very very high.
Sharding is a partitioning pattern for the NoSQL age. It's a partitioning pattern that places each partition in potentially separate servers—potentially all over the world. This scale out works well for supporting people all over the world accessing different parts of the data set with performance.
Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not. Partitioning is about grouping subsets of data within a single database instance.
In our exploratory scheme, each partition is a foreign table and physically lives in a separate database. We call this a "shard", which can also live in a totally separate database cluster. The PostgreSQL community has a roadmap to build sharding capabilities into native PostgreSQL in upcoming versions.
It's open-source. It follows peer-to-peer architecture rather than master-slave architecture, so there isn't a single point of failure. Cassandra can be easily scaled down or up. It features data replication, so it's fault-tolerant and has high availability.
Cassandra does partition across nodes (because if you can't split it you can't scale it). All of the data for a Cassandra cluster is divided up onto "the ring" and each node on the ring is responsible for one or more key ranges. You have control over the Partitioner (e.g. Random, Ordered) and how many nodes on the ring a key/column should be replicated to based on your requirements.
This contains a pretty good overview. Basic architecture
Also, I highly recommend reading the Dynamo white paper. While Cassandra is different than Dynamo in many ways, conceptually they stem from the same roots. Check it out: Dynamo White Paper
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With