I created a Cassandra column-family and I need to load data from a CSV file for this column family. The csv file has a 15 Gb volume.
I am using the CQL 'COPY FROM' command but this takes a long time to make loading the data. What is the best/simplest way to load large amounts of data to Cassandra from csv files?
The CQLSH built-in copy to/from CSV files is pretty simple and is intended for small to moderate sized data sets. You didn't mention which Cassandra version you're using, but there were a lot of performance improvements made in 2.1.5 (CASSANDRA-8225).
An alternative tool that has had good results for larger data is cassandra-loader. You could try that with a subset of your file (like 1000 rows) to confirm it works, then try with your whole file to see the performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With