Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nodetool cfhistograms output

I see a tabular data when I run 'nodetool cfhistograms'.

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             2.00              0.00           8239.00               924                20
75%             4.00              0.00           9887.00              1109                20
95%             4.00              0.00          51012.00              1916                24
98%             4.00              0.00          51012.00              2299                29
99%             4.00              0.00          51012.00              2759                35
Min             0.00              0.00            150.00                73                 2
Max             4.00              0.00          51012.00              3973                60

Could some one please explain how are these calculated? I understand the %le concept, but i want to know how many reads/writes are considered to calculate above result.

like image 421
iamtrk Avatar asked Sep 27 '22 12:09

iamtrk


1 Answers

Its now nodetool tablehistograms. Each table has a histogram for reads and writes which gets updated on completion of a local read/write. This does not include the network time waiting for replicas to meet consistency level etc, thats nodetool proxyhistograms.

Theres a bit of history and they changed over time so it depends on the version of cassandra to explain the output. I gave a talk at the summit a couple years ago here that can explain some "whys". As for awhile (only 2.1) the cfhistograms was reported using Metrics exponentially decaying reservoirs which are very inaccurate. Before 2.1 the cfhistograms was displayed completely differently but at this point not worth mentioning.

Currently they are represented by real histograms, not reservoirs (EstimatedHistogram). These histograms have fixed buckets, each one 20% larger than previous. Since its fixed the value stored is simply a long[] (atomiclongarray/longadder[] depending on version specifically). It identifies which of the buckets holds the value, so in worse case it reports 20% worse than it actually is. From this histogram the percentiles are calculated using standard mechanisms.

There is 2 of these histograms kept. An "all time" histogram and a "recent" histogram. The all time histogram is where the buckets are just constantly incremented since time Cassandra started. This can be used to accurately tell how many events occurred in which bucket since the last time you looked by finding the difference in them. This all time histogram should be what is monitored and alerted on as its accurate. The "recent" histogram forward decays the values of the bucket. Then the more recent values are accounted for exponentially more than previous ones, giving a "about last 15min-ish" view, not really for monitoring but for adhoc view of what it looks like now. Note: this recent histogram didnt exist until 3.0.9/3.8, between 2.2 and then cfhistograms reported all time values.

The "SSTables" column is the number of sstables touched on a read. What "touched" means changed in CASSANDRA-13120. Previously if checking the bloomfilter on an sstable meant possible disk IO so was included, but then it only filters out things by token range and timestamps. Now if a bloomfilter excludes an sstable from the read it is not counted. This is then kept in the 2 histograms mentioned above for the latencies.

Partition Size and Cell Count is generated based on the data on disk. Each sstable keeps histograms of the partition sizes and cell counts calculated while being written. When reading this value for a table it merges the statistics from all the sstables to generate the table wide histogram used here in the percentile calculations.

like image 183
Chris Lohfink Avatar answered Oct 12 '22 10:10

Chris Lohfink