I have a keyspace populated with data that was expensive to generate. I want two copies of this data within my cluster. I would like to end up with two keyspaces: lets call them mydata
and mydatabackup
, both of which contain identical data (I don't mind if the Cassandra timestamps are different).
Is there an easy way to do this? Closest thing I can find to an answer is to use sstable2json and json2sstable as suggested in response to a similar question? Is there a better way?
To select a keyspace in Cassandra and perform actions on it, use the keyword USE . The CQL shell switches to the name of the keyspace you specified. To change the current keyspace, use the same command with another name. Note: Whenever you create a table in Cassandra, you start by defining the keyspace.
Changing the Replication Factor for SimpleStrategy: If you want to change the replication factor of a keyspace, you can do it by executing the ALTER KEYSPACE command, which has the following syntax: Syntax: ALTER KEYSPACE "KeySpace Name" WITH replication = {'class': 'Strategy name', 'replication_factor' : 'No.
Creating a database with multiple keyspaces allows you to create different data models for each keyspace or store unique data in unique keyspaces. Multiple keyspaces within a single region allows for an application built on a per-keyspace data model.
" Is there a better way?"
All Cassandra data are stored in the data/ folder (check config value data_file_directories in cassandra.yaml). You may also check the saved_caches_directory and commitlog_directory config.
Inside the data folder, you'll have
One folder per keyspace
One folder for system keyspace
Some folder for authentication etc..
Inside each keyspace folder, you'll have
*-Data.db files which contain your real data
*-Filter.db files
*-Index.db files for index
...
To replicate data, you do a plain copy of those folders.
In our team, the ops use a crontab to schedule regular backup of Cassandra data this way.
Note: sometimes, you may miss live data which are still in memory or in memtable and not flushed yet to disk. You can trigger a full compaction before backuping data files. But full compaction may hurt you perf so be careful
Better answer: use the provided tool to take a snapshot of you DB:
http://www.datastax.com/docs/1.0/operations/backup_restore
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With