What are best practices for backing up a cassandra cluster?

Tags:

I have a cassandra cluster with ~20 nodes in multiple datacenters. I want to back up the cassandra database. I want it to be possible to restore the backup to a new cluster even if every node in the existing one is simultaneously hit by a meteor.

What exactly do I need to copy off of the server(s) and preserve in order to make a from-scratch restore of a cassandra database possible, and where are these items stored? I gather that this is not as simple as "take a snapshot and rsync it somewhere".
How do I perform the backup and restore?
Where is this process documented?

287

asked Jul 15 '15 17:07

Andrew

1 Answers

Traditional "backup and restore" info can be found here: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_restore_c.html

Essentially, you take snapshot on each machine, and back the files up. Pretty much "take a snapshot and rsync it somewhere"!! Incremental backups can help reduce backup sizes, etc. The link explains it in more detail.

However, if all you want is a "secondary" which can be used if the machines get hit by a meteor, then a common approach is to have another data center (often with fewer nodes), and set the replication factor on the keyspace(s) so that the "backup" datacenter has data replicated to. Your apps would normally use local quorum to write to the "main" datacenter, while the backup will serve...well...as a backup. If the backup dc is powerful, it can even serve as a hot backup.

With this setup, cassandra will stream data to the backup as it's added. This prevents cumbersome snapshot based backups with files stored on a network. However, this will not protect from a dev mistakenly deleting data off cassandra. (things like drop keyspace ... can be recovered up to a certain time period, but if you mistakenly delete some rows...they're gone).

Hope that helps.

148

answered Nov 16 '22 02:11

ashic

Related questions
                            
                                Cassandra: bigger replication factor = faster reads?
                            
                                Is it possible to insert/write data without defining columns in Cassandra?
                            
                                Cassandra table synchronization
                            
                                How to Use Apache Drill with Cassandra
                            
                                Fetching Cassandra row keys
                            
                                Using mahout and hadoop
                            
                                Column family stores vs document stores
                            
                                How to output to file from cassandra client?
                            
                                cql3 query with more than 1 EQ restriction and ORDER BY
                            
                                Selecting timeuuid columns corresponding to a specific date
                            
                                CQL: Bad Request: Missing CLUSTERING ORDER for column
                            
                                Are there any performance penalties when using a TEXT as a Primary Key?
                            
                                Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)
                            
                                Is using IP address as primary key a good practice in scylla db?
                            
                                Massive Database w/ Fulltext Search - Sphinx, Lucene, Cassandra, MongoDB, CouchDB [closed]
                            
                                Cassandra Read a negative frame size
                            
                                Spark with Cassandra input/output
                            
                                specify cqlsh output timezone
                            
                                "All host(s) tried for query failed" Error
                            
                                Check if JNA is enabled in Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are best practices for backing up a cassandra cluster?

Tags:

cassandra

cassandra-2.0

Andrew

People also ask

1 Answers

ashic

Recent Activity

Donate For Us