Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain number of rows in Cassandra table

This is a super basic question but it's actually been bugging me for days. Is there a good way to obtain the equivalent of a COUNT(*) of a given table in Cassandra?

I will be moving several hundreds of millions of rows into C* for some load testing and I'd like to at least get a row count on some sample ETL jobs before I move massive amounts of data over the network.

The best idea I have is to basically loop over each row with Python and auto increment a counter. Is there a better way to determine (or even estimate) the row size of a C* table? I've also poked around Datastax Ops Center to see if I can determine the row size there. If you can, I don't see how it's possible.

Anyone else needed to get a count(*) of a table in C*? If so, how'd you go about doing it?

like image 812
Evan Volgas Avatar asked Oct 28 '14 23:10

Evan Volgas


People also ask

How do I find the number of rows in a Cassandra table?

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

How do you count data in Cassandra?

Counting with Cassandra Base Cassandra, without any of the extra DSE-added features, can already get counts in a few ways. Using CQL, Cassandra's query language, the syntax for a standard count is “SELECT COUNT(*) FROM keyspace. table;”.

How do I find out the size of a table in Cassandra?

If you need to know informaiton about table or tables you can use Nodetool cfstats command. Syntax: If you will only provide the name of keyspace, it will provide stats for all the tables in that keyspace.

How do you calculate row size in Cassandra?

To calculate the size of a row, we need to sum the size of all columns within the row and add that sum to the partition key size. Assuming the size of the partition key is consistent throughout a table, calculating the size of a table is almost identical to calculating the size of a partition.


2 Answers

Yes, you can use COUNT(*). Here's the documentation.

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

Count the number of rows in the users table:

SELECT COUNT(*) FROM users; 
like image 93
catpaws Avatar answered Sep 25 '22 04:09

catpaws


You can use copy to avoid cassandra timeout usually happens on count(*)

cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'

like image 29
Shubham Avatar answered Sep 22 '22 04:09

Shubham