Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetch all rows in cassandra

Tags:

cassandra

I have a cassandra table containing 3 million rows. Now I am trying to fetch all the rows and write them to several csv files. I know it is impossible to perform select * from mytable. Could someone please tell how I can do this?

Or are there any ways to read the rows n rows by n rows without specifying any where conditions?

like image 715
Benson Avatar asked May 19 '14 19:05

Benson


People also ask

How do I select distinct rows in Cassandra?

Use the DISTINCT keyword to return only distinct (different) values of partition keys. The FROM clause specifies the table to query. You may want to precede the table name with the name of the keyspace followed by a period (.). If you do not specify a keyspace, Cassandra queries the current keyspace.


3 Answers

as I know, one improvement in cassandra 2.0 'on the driver side' is automatic-paging. you can do something like this :

Statement stmt = new SimpleStatement("SELECT * FROM images LIMIT 3000000");
stmt.setFetchSize(100);
ResultSet rs = session.execute(stmt);

// Iterate over the ResultSet here

for more read Improvements on the driver side with Cassandra 2.0

you can find the driver here.

like image 184
Taher Khorshidi Avatar answered Oct 05 '22 18:10

Taher Khorshidi


You could use Pig to read the data and store it into HDFS, then copy it out as a single file:

In Pig:

data = LOAD 'cql://your_ksp/your_table' USING CqlStorage();
STORE data INTO '/path/to/output' USING PigStorage(',');

From OS shell:

hadoop fs -copyToLocal hdfs://hadoop_url/path/to/output /path/to/local/storage
like image 44
rs_atl Avatar answered Oct 05 '22 18:10

rs_atl


by default with select statement you can get only 100000 records.. so after that if you have to retrieve records you have to specify limit..

Select * from tablename LIMIT 10000000 (in your case 3 million then specify it)...

like image 26
Helping Hand.. Avatar answered Oct 05 '22 18:10

Helping Hand..