Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I copy a keyspace within a cluster

Tags:

cassandra

I have a keyspace populated with data that was expensive to generate. I want two copies of this data within my cluster. I would like to end up with two keyspaces: lets call them mydata and mydatabackup, both of which contain identical data (I don't mind if the Cassandra timestamps are different).

Is there an easy way to do this? Closest thing I can find to an answer is to use sstable2json and json2sstable as suggested in response to a similar question? Is there a better way?

like image 924
lorcan Avatar asked Sep 13 '13 16:09

lorcan


People also ask

How do I select a Keyspace in Cassandra?

To select a keyspace in Cassandra and perform actions on it, use the keyword USE . The CQL shell switches to the name of the keyspace you specified. To change the current keyspace, use the same command with another name. Note: Whenever you create a table in Cassandra, you start by defining the keyspace.

How do I change the keyspace replication factor in Cassandra?

Changing the Replication Factor for SimpleStrategy: If you want to change the replication factor of a keyspace, you can do it by executing the ALTER KEYSPACE command, which has the following syntax: Syntax: ALTER KEYSPACE "KeySpace Name" WITH replication = {'class': 'Strategy name', 'replication_factor' : 'No.

Can we create multiple keyspace?

Creating a database with multiple keyspaces allows you to create different data models for each keyspace or store unique data in unique keyspaces. Multiple keyspaces within a single region allows for an application built on a per-keyspace data model.


1 Answers

" Is there a better way?"

All Cassandra data are stored in the data/ folder (check config value data_file_directories in cassandra.yaml). You may also check the saved_caches_directory and commitlog_directory config.

Inside the data folder, you'll have

  1. One folder per keyspace

  2. One folder for system keyspace

  3. Some folder for authentication etc..

    Inside each keyspace folder, you'll have

  4. *-Data.db files which contain your real data

  5. *-Filter.db files

  6. *-Index.db files for index

  7. ...

To replicate data, you do a plain copy of those folders.

In our team, the ops use a crontab to schedule regular backup of Cassandra data this way.

Note: sometimes, you may miss live data which are still in memory or in memtable and not flushed yet to disk. You can trigger a full compaction before backuping data files. But full compaction may hurt you perf so be careful


Better answer: use the provided tool to take a snapshot of you DB:

http://www.datastax.com/docs/1.0/operations/backup_restore

like image 146
doanduyhai Avatar answered Oct 21 '22 03:10

doanduyhai