Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to load data into Cassandra column-family

I created a Cassandra column-family and I need to load data from a CSV file for this column family. The csv file has a 15 Gb volume.

I am using the CQL 'COPY FROM' command but this takes a long time to make loading the data. What is the best/simplest way to load large amounts of data to Cassandra from csv files?

like image 566
Pedro Cunha Avatar asked Dec 25 '22 13:12

Pedro Cunha


1 Answers

The CQLSH built-in copy to/from CSV files is pretty simple and is intended for small to moderate sized data sets. You didn't mention which Cassandra version you're using, but there were a lot of performance improvements made in 2.1.5 (CASSANDRA-8225).

An alternative tool that has had good results for larger data is cassandra-loader. You could try that with a subset of your file (like 1000 rows) to confirm it works, then try with your whole file to see the performance.

like image 82
BrianC Avatar answered May 03 '23 15:05

BrianC