Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Row count of a column family in Cassandra

Is there a way to get a row count (key count) of a single column family in Cassandra? get_count can only be used to get the column count.

For instance, if I have a column family containing users and wanted to get the number of users. How could I do it? Each user is it's own row.

like image 266
Henri Liljeroos Avatar asked Dec 23 '09 10:12

Henri Liljeroos


People also ask

How do you check row count in Cassandra?

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

How do you count data in Cassandra?

Counting with Cassandra Base Cassandra, without any of the extra DSE-added features, can already get counts in a few ways. Using CQL, Cassandra's query language, the syntax for a standard count is “SELECT COUNT(*) FROM keyspace. table;”.


1 Answers

If you are working on a large data set and are okay with a pretty good approximation, I highly recommend using the command:

nodetool --host <hostname> cfstats 

This will dump out a list for each column family looking like this:

Column Family: widgets SSTable count: 11 Space used (live): 4295810363 Space used (total): 4295810363 Number of Keys (estimate): 9709824 Memtable Columns Count: 99008 Memtable Data Size: 150297312 Memtable Switch Count: 434 Read Count: 9716802 Read Latency: 0.036 ms. Write Count: 9716806 Write Latency: 0.024 ms. Pending Tasks: 0 Bloom Filter False Postives: 10428 Bloom Filter False Ratio: 1.00000 Bloom Filter Space Used: 18216448 Compacted row minimum size: 771 Compacted row maximum size: 263210 Compacted row mean size: 1634 

The "Number of Keys (estimate)" row is a good guess across the cluster and the performance is a lot faster than explicit count approaches.

like image 121
Justin DeMaris Avatar answered Sep 21 '22 10:09

Justin DeMaris