Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Moving Data to Another New Cassandra node -

Tags:

cassandra

I have one production cassandra node and would like create same cassandra in my local machine. As per my understanding, I can opt following option

 1. Taking snapshot of each keyspace from production and use it in local machine.(But It would take more time as I have many keyspace).
 2. Exporting production cassandra data to as CSV and importing in to local cassandra(I have COUNTER TABLE. Hence,It also creating some headache - Correct me if it is not).

My Question is "What will happen if I move entire data_directory,Commit_log folder from production to local and start local cassandra?". Is it possible anyway?

When I did the above solution,cassandra throws many errors.

like image 931
JsLearner Avatar asked Jun 15 '16 12:06

JsLearner


People also ask

What happens when a new node is added to Cassandra?

Adding a new node to an existing cluster, in Apache Cassandra version 3 and higher is fairly easy. When a new node is added to the cluster, Cassandra will automatically adjust the token ranges each node is responsible for resulting in each node in the cluster storing a smaller subset of the data.

How do I change a Cassandra node?

In the cassandra. yaml file for each node, remove the IP address of the dead node from the - seeds list in the seed-provider property. If the cluster needs a new seed node to replace the dead node, add the new node's IP address to the - seeds list of the other nodes.

How do I add a new node to an existing cluster in Cassandra?

Open the node's cassandra. yaml file and add the node's address to the seed_provider list. Make this change on all other nodes in the cluster. Start Cassandra as a service or a stand-alone process.

How do nodes share and update data with each other in Cassandra?

In Cassandra all nodes communicating with each other via a gossip protocol. Gossip is the message system that Cassandra node use to make their data consistent with each other.


2 Answers

If all you're looking to do is recreate your production node on a local machine, then all you really need to do is copy everything (assuming hardware is similar).

From Production:

  • Flush the data from memtables to disk.
  • Run nodetool snapshot and get snapshots for all your ColumnFamilies/KeySpaces
  • Make sure you have the CQL scripts that created your ColumnFamilies/Keyspaces
  • Copy the config files, commitlogs, saved_cache, logs, data directories

To your local machine (assuming fresh install)

  • Install Cassandra (Make sure it's the same version as Production)
  • Recreate ColumnFamilies/Keyspaces using the scripts you copied from production
  • Copy over config files/edit your config files/saved_cache,logs, data directories
  • Place snapshots in the proper directories
    • possibly something like <data_dir>/<keyspace>/<columnfamily>/
  • Start up Cassandra

Note: These checklists are not completely thorough

Running nodetool repair isn't a bad idea in this case. Assuming you just want to recreate the production node on a local machine (as stated in the question), then it might be moot as the snapshot would have the current data. Also running nodetool cleanup wouldn't hurt either, if repair was deemed essential.

Answering your question:

If you just copy the data directory and commitlogs from production onto your local machine won't really work as you need to recreate the keyspaces and column families to put the data in. If however you did that, then something else is at work. To get the data from one cassandra environment to the next, the config files, data directory (commitlogs, data, saved_cache, etc), and the schema scripts are the most important. From there you probably can debug issues. A fresh install (or remapping the current data/commitlog/etc directories to new directories, ie. new_data, new_commitlog, new_saved_cache) might be the easiest way to accomplish the task.

like image 93
K.Boyette Avatar answered Oct 14 '22 09:10

K.Boyette


If you have one node, you can copy the /data, /saved_caches, and /commitlog folders to your local machine.You need the same version of cassandra. But first, you need to export your schema(s) from production and import into your local machine. Then stop local cassandra, delete any contents of local /commitlog folder, copy data from prod into local - the foldernames in /data will probably be different because newer versions of c* append a UUID to the tablename folders. but it will work. you may have to run nodetool repair afterward.

like image 28
LHWizard Avatar answered Oct 14 '22 10:10

LHWizard