I have a cassandra table containing 3 million rows. Now I am trying to fetch all the rows and write them to several csv files. I know it is impossible to perform select * from mytable
. Could someone please tell how I can do this?
Or are there any ways to read the rows n
rows by n
rows without specifying any where
conditions?
Use the DISTINCT keyword to return only distinct (different) values of partition keys. The FROM clause specifies the table to query. You may want to precede the table name with the name of the keyspace followed by a period (.). If you do not specify a keyspace, Cassandra queries the current keyspace.
as I know, one improvement in cassandra 2.0 'on the driver side' is automatic-paging. you can do something like this :
Statement stmt = new SimpleStatement("SELECT * FROM images LIMIT 3000000");
stmt.setFetchSize(100);
ResultSet rs = session.execute(stmt);
// Iterate over the ResultSet here
for more read Improvements on the driver side with Cassandra 2.0
you can find the driver here.
You could use Pig to read the data and store it into HDFS, then copy it out as a single file:
In Pig:
data = LOAD 'cql://your_ksp/your_table' USING CqlStorage();
STORE data INTO '/path/to/output' USING PigStorage(',');
From OS shell:
hadoop fs -copyToLocal hdfs://hadoop_url/path/to/output /path/to/local/storage
by default with select statement you can get only 100000 records.. so after that if you have to retrieve records you have to specify limit..
Select * from tablename LIMIT 10000000
(in your case 3 million then specify it)...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With