Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding "Number of keys" in nodetool cfstats

I am new to Cassandra, in this example i am using a cluster with 1 DC and 5 nodes and a NetworkTopologyStrategy with replication factor as 3.

   Keyspace: activityfeed
            Read Count: 0
            Read Latency: NaN ms.
            Write Count: 0
            Write Latency: NaN ms.
            Pending Tasks: 0
                    Table: feed_shubham
                    SSTable count: 1
                    Space used (live), bytes: 52620684
                    Space used (total), bytes: 52620684
                    SSTable Compression Ratio: 0.3727660543119897
                    Number of keys (estimate): 137984
                    Memtable cell count: 0
                    Memtable data size, bytes: 0
                    Memtable switch count: 0
                    Local read count: 0
                    Local read latency: 0.000 ms
                    Local write count: 0
                    Local write latency: 0.000 ms
                    Pending tasks: 0
                    Bloom filter false positives: 0
                    Bloom filter false ratio: 0.00000
                    Bloom filter space used, bytes: 174416
                    Compacted partition minimum bytes: 771
                    Compacted partition maximum bytes: 924
                    Compacted partition mean bytes: 924
                    Average live cells per slice (last five minutes): 0.0
                    Average tombstones per slice (last five minutes): 0.0

What does Number of keys here mean? I have 5 different nodes in my cluster, and after firing the below command on each node separately i get different statistic for the same table.

nodetool cfstats -h 192.168.1.12 activityfeed.feed_shubham

As per the output above i can interpret that cfstats gives me stats regarding the physical storage of data on each node.

And i went through the below doc http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFstats.html But i did not find the explanation for number of keys in there.

I am using a RandomPartitioner.

Is this key anything to do with the Partition key?

I have around 200000 record in my table.

like image 223
Yasmeen Avatar asked Jan 15 '15 12:01

Yasmeen


People also ask

How do I find the size of a Keyspace in Cassandra?

If you need to know informaiton about table or tables you can use Nodetool cfstats command. Syntax: If you will only provide the name of keyspace, it will provide stats for all the tables in that keyspace.

What is cell count in Cassandra?

Cell Count In CQL terms you can think of it as a column. This is an estimate of the number of partitions have X number of columns for a given table. It appears to only include columns in SSTables as it is the total of the estimated column count for each SSTable (from TableHistograms.

What is Nodetool in Cassandra?

The nodetool utility is a command line interface for Cassandra. You can use it to help manage a cluster. In binary installations, nodetool is located in the <install_location>/bin directory. Square brackets indicate optional parameters.

What is Nodetool flush?

Flushes one or more tables from the memtable to SSTables on disk. Flushes one or more tables from the memtable to SSTables on disk. OpsCenter provides a flush option for Flushing tables in Nodes.


2 Answers

The number of keys represents the number of partition keys on that node for the table. Its just an estimate though, and based on your version of C* its more accurate. Before 2.1.6 it summed the number of partitions listed in index file per sstable. Afterwards it merges a sketch of the data (hyperloglog) thats stored per sstable.

like image 139
Chris Lohfink Avatar answered Oct 16 '22 15:10

Chris Lohfink


This value seems to indicate the total number of columns/cells in all local sstables. I guess it should be rather named "SSTable cell count" just as the corresponding memtable value. However, as sstables store redundant data before compaction, this value will not necessarily correspond to the actual number of columns returned as part of a result set.

like image 3
Stefan Podkowinski Avatar answered Oct 16 '22 14:10

Stefan Podkowinski