I see a tabular data when I run 'nodetool cfhistograms'.
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 2.00 0.00 8239.00 924 20
75% 4.00 0.00 9887.00 1109 20
95% 4.00 0.00 51012.00 1916 24
98% 4.00 0.00 51012.00 2299 29
99% 4.00 0.00 51012.00 2759 35
Min 0.00 0.00 150.00 73 2
Max 4.00 0.00 51012.00 3973 60
Could some one please explain how are these calculated? I understand the %le concept, but i want to know how many reads/writes are considered to calculate above result.
Its now nodetool tablehistograms
. Each table has a histogram for reads and writes which gets updated on completion of a local read/write. This does not include the network time waiting for replicas to meet consistency level etc, thats nodetool proxyhistograms
.
Theres a bit of history and they changed over time so it depends on the version of cassandra to explain the output. I gave a talk at the summit a couple years ago here that can explain some "whys". As for awhile (only 2.1) the cfhistograms was reported using Metrics exponentially decaying reservoirs which are very inaccurate. Before 2.1 the cfhistograms was displayed completely differently but at this point not worth mentioning.
Currently they are represented by real histograms, not reservoirs (EstimatedHistogram). These histograms have fixed buckets, each one 20% larger than previous. Since its fixed the value stored is simply a long[] (atomiclongarray/longadder[] depending on version specifically). It identifies which of the buckets holds the value, so in worse case it reports 20% worse than it actually is. From this histogram the percentiles are calculated using standard mechanisms.
There is 2 of these histograms kept. An "all time" histogram and a "recent" histogram. The all time histogram is where the buckets are just constantly incremented since time Cassandra started. This can be used to accurately tell how many events occurred in which bucket since the last time you looked by finding the difference in them. This all time histogram should be what is monitored and alerted on as its accurate. The "recent" histogram forward decays the values of the bucket. Then the more recent values are accounted for exponentially more than previous ones, giving a "about last 15min-ish" view, not really for monitoring but for adhoc view of what it looks like now. Note: this recent histogram didnt exist until 3.0.9/3.8, between 2.2 and then cfhistograms reported all time values.
The "SSTables" column is the number of sstables touched on a read. What "touched" means changed in CASSANDRA-13120. Previously if checking the bloomfilter on an sstable meant possible disk IO so was included, but then it only filters out things by token range and timestamps. Now if a bloomfilter excludes an sstable from the read it is not counted. This is then kept in the 2 histograms mentioned above for the latencies.
Partition Size and Cell Count is generated based on the data on disk. Each sstable keeps histograms of the partition sizes and cell counts calculated while being written. When reading this value for a table it merges the statistics from all the sstables to generate the table wide histogram used here in the percentile calculations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With