I have a cassandra cluster with ~20 nodes in multiple datacenters. I want to back up the cassandra database. I want it to be possible to restore the backup to a new cluster even if every node in the existing one is simultaneously hit by a meteor.
By default, incremental backup is disabled in Cassandra. This can be enabled by changing the value of “incremental_backups” to “true” in the cassandra. yaml file. Once enabled, Cassandra creates a hard link to each memtable flushed to SSTable to a backup's directory under the keyspace data directory.
To manually back up a keyspace, you must use the nodetool snapshot command to create hard links, run a custom script to back up these links, then archive that backup. It is recommended to take a snapshot backup on a daily basis. You must repeat these steps for each keyspace to back up.
A snapshot is a copy of a table's SSTable files at a given time, created via hard links. The DDL to create the table is stored as well. Snapshots may be created by a user or created automatically. The setting snapshot_before_compaction in the cassandra.
Traditional "backup and restore" info can be found here: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_restore_c.html
Essentially, you take snapshot on each machine, and back the files up. Pretty much "take a snapshot and rsync it somewhere"!! Incremental backups can help reduce backup sizes, etc. The link explains it in more detail.
However, if all you want is a "secondary" which can be used if the machines get hit by a meteor, then a common approach is to have another data center (often with fewer nodes), and set the replication factor on the keyspace(s) so that the "backup" datacenter has data replicated to. Your apps would normally use local quorum to write to the "main" datacenter, while the backup will serve...well...as a backup. If the backup dc is powerful, it can even serve as a hot backup.
With this setup, cassandra will stream data to the backup as it's added. This prevents cumbersome snapshot based backups with files stored on a network. However, this will not protect from a dev mistakenly deleting data off cassandra. (things like drop keyspace ... can be recovered up to a certain time period, but if you mistakenly delete some rows...they're gone).
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With