Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to migrate data from Cassandra cluster of size N to a different cluster of size N+/-M

I'm trying to figure out how to migrate data from one cassandra cluster, to another cassandra cluster of a different ring size...say from a 5 node cluster to a 7 node cluster.

I started looking at sstable2json, since it creates a json file for the SSTable on that specific cassandra node. My thought was to do this for a column family on each node in the ring. So on a 5 node ring, this would give me 5 json files, one file for the data stored on in the column family that resides on each node.

Then I'd merge the json files into one file, and use json2sstable to import into a new cluster, of size, lets say 7. I was hoping that cassandra would then replicate/balance the data out evenly across the nodes in the ring, but I just read that SSTables are immutable once written. So if I did what I just mentioned, I'd end up with a ring with all the data in my column family on one node.

So can anyone help me figure out the process for migrating data from one cluster to a different cluster of a different ring size?

like image 899
Turbo Avatar asked Jul 21 '11 18:07

Turbo


1 Answers

Better: use bin/sstableloader on the sstables from the old ring, to stream to the new one.

Normally sstableloader is used in a sequence like this:

  1. Create sstables locally using SSTableWriter
  2. Use sstableloader to stream the data in the sstables to the right nodes (bin/sstableloader path-to-directory-full-of-sstables). The directory name is assumed to be the keyspace, which will be the case if you point it at an existing Cassandra data directory.

Since you're looking to stream data from an existing cluster A to a new cluter B, you can skip straight to running sstableloader against the data on each node in cluster A.

More details on using sstableloader in this blog post.

like image 60
jbellis Avatar answered Oct 05 '22 20:10

jbellis