How does cassandra split keyspace data when multiple directories are configured?

Tags:

cassandra

I have configured three separate data directories in cassandra.yaml file as given below:

data_file_directories:
    - E:/Cassandra/data/var/lib/cassandra/data
    - K:/Cassandra/data/var/lib/cassandra/data

when I create keyspace and insert data my key space got created in both two directories and data got scattered. what I want to know is how cassandra splits the data between multiple directories?. And what is the rule behind this?

251

asked Apr 10 '13 12:04

vignesh kumar rathakumar

1 Answers

You are using the JBOD feature of Cassandra when you add multiple entries under data_file_directories. Data is spread evenly over the configured drives proportionate to their available space.

This also let's you take advantage of the disk_failure_policy setting. You can read about the details here: http://www.datastax.com/dev/blog/handling-disk-failures-in-cassandra-1-2

In short, you can configure Cassandra to keep going, doing what it can if the disk becomes full or fails completely. This has advantages over RAID0 (where you would effectively have the same capacity as JBOD) in that you do not have to replace the whole data set from backup (or full repair) but just run a repair for the missing data. On the other hand, RAID0 provides higher throughput (depending how well you know how to tune RAID arrays to match filesystem and drive geometry).

If you have the resources for fault-tolerant/more performant RAID setup (like RAID10 for example), you may want to just use a single directory for simplicity. Most deployments are starting to lean towards the density route, using JBOD rather than systems-level tolerance though.

You can read about the thought process behind the development of this issue here: https://issues.apache.org/jira/browse/CASSANDRA-4292

196

answered Sep 21 '22 20:09

zznate

Related questions
                            
                                Cassandra Allow filtering
                            
                                Mapping Cassandra Super Columns
                            
                                Paging Resultsets in Cassandra with compound primary keys - Missing out on rows
                            
                                Combine results from batch RDD with streaming RDD in Apache Spark
                            
                                Cassandra IN query not working if table has SET type column
                            
                                Streaming data from Kafka into Cassandra in real time
                            
                                modelling cassandra tables for upsert and select query
                            
                                Database that consumes less disk space
                            
                                How should I copy a keyspace within a cluster
                            
                                Is TTL for Cassandra counter column family supported?
                            
                                Pandas and Cassandra: numpy array format incompatibility
                            
                                Best way to add multiple nodes to existing cassandra cluster
                            
                                Generate a script to create a table from the entity definition
                            
                                best way to run nodetool upgradesstables after update?
                            
                                Order by created date In Cassandra
                            
                                Is there any harm in running PHP and Ruby on the same server?
                            
                                How do you check for the existence of a column family in hector?
                            
                                What NoSQL solution is best to store Apache error_log and access_log? Cassandra or MongoDB?
                            
                                Do you need Solr/Lucene for MongoDB, CouchDB and Cassandra?
                            
                                What does cassandra do during compaction?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With