Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you use the Cassandra tool sstableloader?

Tags:

cassandra

I'm trying to use the sstableloader to load data into an existing Cassandra ring, but cant figure out how to actually get it to work. I'm trying to run it on a machine that has a running cassandra node on it, but when I run it I get an error saying that port 7000 is already in use, which is the port the running Cassandra node is using for gossip.

So does that mean I can only use sstableloader on a machine that is in the same network as the target cassandra ring, but isn't actually running a cassandra node?

Any details would be useful, thanks.

like image 328
Turbo Avatar asked Jul 26 '11 15:07

Turbo


People also ask

How does sstableloader work?

The sstableloader streams a set of SSTable data files to a live cluster. It does not simply copy the set of SSTables to every node, but transfers the relevant part of the data to each node, conforming to the replication strategy of the cluster. The table into which the data is loaded does not need to be empty.

What command bulk load data files in Cassandra?

Cassandra provides two commands or tools for bulk loading data: Cassandra Bulk loader, also called sstableloader. The nodetool import command.


2 Answers

Played around with sstableloader, read the source code, and finally figured out how to run sstableloader on the same machine that hosts a running cassandra node. There are two key points to get this running. First you need to create a copy of the cassandra install folder for sstableloader. This is becase sstableloader reads the yaml file to figure out what ipaddress to use for gossip, and the existing yaml file is being used by Cassandra. The second point is that you'll need to create a new loopback ipaddress (something like 127.0.0.2) on your machine. Once this is done, change the yaml file in the copied Cassandra install folder to listen to this ipaddress.

I wrote a tutorial going more into detail about how to do this here: http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx

like image 197
Turbo Avatar answered Oct 22 '22 14:10

Turbo


The Austin Cassandra Users Group just had a presentation on this: http://www.slideshare.net/alex_araujo/etl-with-cassandra-streaming-bulk-loading/

like image 43
zznate Avatar answered Oct 22 '22 15:10

zznate